Llama-3-8B-Instruct GPTQ 4-bit (Medical Optimized)

This is a 4-bit GPTQ quantized version of meta-llama/Meta-Llama-3-8B-Instruct.

🏥 Medical Domain Optimization

This model has been quantized using medical-domain calibration for optimal performance on clinical and healthcare applications.

Calibration Dataset

  • PubMedQA (60%): Medical literature Q&A
  • PMC-Patients (40%): Clinical case reports

Use Cases

  • Radiology report summarization
  • Clinical documentation assistance
  • Medical literature Q&A
  • Patient-facing health information

Important Notes

⚠️ Validation Required: All medical outputs should be reviewed by qualified healthcare professionals. This model is a tool to assist, not replace, medical judgment.

Model Details

  • Base Model: meta-llama/Meta-Llama-3-8B-Instruct
  • Quantization: 4-bit GPTQ
  • Group Size: 128
  • Calibration: Medical domain mix (PubMedQA + PMC-Patients)
  • Calibration Samples: 256
  • Model Size: 5.3 GB
  • Compression: 3.0x smaller than FP16

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("nalrunyan/llama3-8b-gptq-4bit")
model = AutoModelForCausalLM.from_pretrained("nalrunyan/llama3-8b-gptq-4bit", device_map="auto")

prompt = "Explain the diagnosis:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quantization Details

This model was quantized using GPTQ with:

  • Bits: 4
  • Group size: 128
  • Backend: AutoGPTQ

Created on Kaggle with 2x T4 GPUs.

Downloads last month
53
Safetensors
Model size
8B params
Tensor type
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nalrunyan/llama3-8b-gptq-4bit

Quantized
(264)
this model

Datasets used to train nalrunyan/llama3-8b-gptq-4bit