MEDFIT-LLM-3B: Fine-tuned Llama-3.2-3B for Medical QA

MEDFIT-LLM-3B is a specialized language model fine-tuned from Meta's Llama-3.2-3B-Instruct for healthcare and medical question-answering applications. This model demonstrates significant improvements in direct answer capabilities and medical domain understanding through domain-focused fine-tuning.

Model Details

Model Description

MEDFIT-LLM-3B is a 3 billion parameter language model specifically optimized for healthcare chatbot applications. The model was fine-tuned using LoRA (Low-Rank Adaptation) techniques on a carefully curated dataset of healthcare-related questions and answers, resulting in enhanced performance for medical information dissemination and patient education.

Developed by: Aditya Karnam Gururaj Rao, Arjun Jaggi, Sonam Naidu
Model type: Causal Language Model (Fine-tuned)
Language(s): English
License: Llama 3.2 Community License
Finetuned from model: meta-llama/Llama-3.2-3B-Instruct
Fine-tuning method: LoRA (Low-Rank Adaptation)
Training framework: MLX

Model Sources

Repository: https://huggingface.co/adityak74/medfit-llm-3B
Paper: MEDFIT-LLM: Medical Enhancements through Domain-Focused Fine Tuning of Small Language Models
Code: https://github.com/adityak74/medfit-llm

Performance Highlights

Based on comprehensive evaluation against the base Llama-3.2-3B-Instruct model:

Direct Answer Improvement: 30 percentage point increase (from 6.0% to 36.0%)
Response Structure: 18% increase in numbered list usage for better organization
Overall Improvement Score: 108.2 (highest among evaluated models)
Response Length: Slight increase (+2.84%) with more comprehensive answers

Uses

Direct Use

MEDFIT-LLM-3B is designed for healthcare chatbot applications where accurate, well-structured medical information delivery is crucial. The model excels at:

Medical Question Answering: Providing direct, accurate responses to healthcare queries
Patient Education: Delivering structured, easy-to-understand medical information
Healthcare Information Dissemination: Supporting healthcare providers with reliable AI assistance
Medical Chatbot Applications: Serving as the backbone for healthcare conversational agents

Downstream Use

The model can be integrated into:

Healthcare mobile applications
Medical information systems
Patient support platforms
Telemedicine chatbots
Medical education tools

Out-of-Scope Use

Important: This model is NOT intended for:

Medical diagnosis or treatment recommendations
Emergency medical situations
Replacement of professional medical advice
Clinical decision-making without human oversight
Prescription or medication recommendations

Training Details

Training Data

The Dataset is available here: https://huggingface.co/datasets/mlx-community/medfit-dataset

The model was trained on a carefully curated dataset comprising:

Total samples: 6,444 unique healthcare-related question-answer pairs
Training set: 5,155 samples
Validation set: 644 samples
Test set: 645 samples

The dataset was created using:

Synthetic data generation: 10,000 initial samples generated using Phi-4
Domain-specific curation: Healthcare-focused questions derived from existing research
Deduplication: Filtered to remove duplicates, resulting in 6,444 unique samples

Training Procedure

Fine-tuning Method

Technique: LoRA (Low-Rank Adaptation)
Framework: MLX
Base model: meta-llama/Llama-3.2-3B-Instruct
Focus: Healthcare domain specialization

Training Hyperparameters

Fine-tuning approach: Domain-focused LoRA adaptation
Dataset split: 80% training, 10% validation, 10% testing
Training regime: Optimized for healthcare question-answering performance

Evaluation

Testing Data & Metrics

The model was evaluated on:

50 healthcare-specific validation questions
Comparative analysis against base Llama-3.2-3B-Instruct
Multi-dimensional assessment including direct answer capability, response structure, and generation efficiency

Key Results

Direct Answer Performance:

Base model: 6.0% direct answer rate
Fine-tuned model: 36.0% direct answer rate
Improvement: +30.0 percentage points

Response Quality:

Enhanced structure with increased use of numbered lists (+18%)
Improved organization and systematic presentation
Better alignment with healthcare communication standards

Generation Efficiency:

Slight increase in generation time (+1.6%)
Trade-off between response quality and speed
Overall positive impact on response comprehensiveness

Bias, Risks, and Limitations

Limitations

Not a substitute for professional medical advice
May generate plausible-sounding but incorrect medical information
Limited to English language medical contexts
Training data may not cover all medical specialties equally
Performance may vary across different healthcare subdomains

Recommendations

Always verify medical information with qualified healthcare professionals
Use as a supplementary tool rather than primary medical resource
Implement human oversight in all healthcare applications
Regular updates needed to maintain medical accuracy as knowledge evolves
Consider integration with retrieval-augmented generation (RAG) for enhanced factual accuracy

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("adityak74/medfit-llm-3B")
model = AutoModelForCausalLM.from_pretrained("adityak74/medfit-llm-3B")

# Example usage
prompt = "What are the common symptoms of diabetes?"
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(inputs, max_length=200, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Environmental Impact

The fine-tuning process utilized efficient LoRA techniques to minimize computational requirements while maximizing performance improvements. This approach reduces the environmental impact compared to full model training while achieving significant domain-specific enhancements.

Citation

BibTeX:

@inproceedings{rao2025medfit,
  title={MEDFIT-LLM: Medical Enhancements through Domain-Focused Fine Tuning of Small Language Models},
  author={Rao, Aditya Karnam Gururaj and Jaggi, Arjun and Naidu, Sonam},
  booktitle={2025 2nd International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE)},
  year={2025},
  organization={IEEE}
}

APA: Rao, A. K. G., Jaggi, A., & Naidu, S. (2025). MEDFIT-LLM: Medical Enhancements through Domain-Focused Fine Tuning of Small Language Models. In 2025 2nd International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE). IEEE.

Glossary

QA: Question Answering
EHR: Electronic Health Record
LoRA: Low-Rank Adaptation - an efficient fine-tuning technique
MLX: Machine Learning framework used for training
Direct Answer Rate: Percentage of responses that begin with direct answers rather than preambles

Model Card Authors

Aditya Karnam Gururaj Rao (Zefr Inc, LA, USA)
Arjun Jaggi (HCLTech, LA, USA)
Sonam Naidu (LexisNexis, USA)

Model Card Contact

Primary Contact: https://huggingface.co/adityak74
Email: [email protected]
GitHub: https://github.com/adityak74/medfit-llm

Disclaimer

This model is designed for educational and informational purposes in healthcare contexts. It should not be used as a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of qualified healthcare providers with questions regarding medical conditions or treatments.

Downloads last month: 128

Safetensors

Model size

3B params

Tensor type

F16

Model tree for adityak74/medfit-llm-3B

Base model

meta-llama/Llama-3.2-3B-Instruct

Adapter

(485)

this model