Llama-3-8B-Instruct GPTQ 4-bit (Medical Optimized)
This is a 4-bit GPTQ quantized version of meta-llama/Meta-Llama-3-8B-Instruct.
🏥 Medical Domain Optimization
This model has been quantized using medical-domain calibration for optimal performance on clinical and healthcare applications.
Calibration Dataset
- PubMedQA (60%): Medical literature Q&A
- PMC-Patients (40%): Clinical case reports
Use Cases
- Radiology report summarization
- Clinical documentation assistance
- Medical literature Q&A
- Patient-facing health information
Important Notes
⚠️ Validation Required: All medical outputs should be reviewed by qualified healthcare professionals. This model is a tool to assist, not replace, medical judgment.
Model Details
- Base Model: meta-llama/Meta-Llama-3-8B-Instruct
- Quantization: 4-bit GPTQ
- Group Size: 128
- Calibration: Medical domain mix (PubMedQA + PMC-Patients)
- Calibration Samples: 256
- Model Size: 5.3 GB
- Compression: 3.0x smaller than FP16
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("nalrunyan/llama3-8b-gptq-4bit")
model = AutoModelForCausalLM.from_pretrained("nalrunyan/llama3-8b-gptq-4bit", device_map="auto")
prompt = "Explain the diagnosis:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Quantization Details
This model was quantized using GPTQ with:
- Bits: 4
- Group size: 128
- Backend: AutoGPTQ
Created on Kaggle with 2x T4 GPUs.
- Downloads last month
- 53
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for nalrunyan/llama3-8b-gptq-4bit
Base model
meta-llama/Meta-Llama-3-8B-Instruct