Llama-3-8B-Instruct GPTQ 4-bit (Medical Optimized)

This is a 4-bit GPTQ quantized version of meta-llama/Meta-Llama-3-8B-Instruct.

🏥 Medical Domain Optimization

This model has been quantized using medical-domain calibration for optimal performance on clinical and healthcare applications.

Calibration Dataset

PubMedQA (60%): Medical literature Q&A
PMC-Patients (40%): Clinical case reports

Use Cases

Radiology report summarization
Clinical documentation assistance
Medical literature Q&A
Patient-facing health information

Important Notes

⚠️ Validation Required: All medical outputs should be reviewed by qualified healthcare professionals. This model is a tool to assist, not replace, medical judgment.

Model Details

Base Model: meta-llama/Meta-Llama-3-8B-Instruct
Quantization: 4-bit GPTQ
Group Size: 128
Calibration: Medical domain mix (PubMedQA + PMC-Patients)
Calibration Samples: 256
Model Size: 5.3 GB
Compression: 3.0x smaller than FP16

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("nalrunyan/llama3-8b-gptq-4bit")
model = AutoModelForCausalLM.from_pretrained("nalrunyan/llama3-8b-gptq-4bit", device_map="auto")

prompt = "Explain the diagnosis:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quantization Details

This model was quantized using GPTQ with:

Bits: 4
Group size: 128
Backend: AutoGPTQ

Created on Kaggle with 2x T4 GPUs.

Downloads last month: 53

Safetensors

Model size

8B params

Tensor type

I32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nalrunyan/llama3-8b-gptq-4bit

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Quantized

(264)

this model

nalrunyan
/

llama3-8b-gptq-4bit