MEDFIT-LLM-3B: Fine-tuned Llama-3.2-3B for Medical QA

MEDFIT-LLM-3B is a specialized language model fine-tuned from Meta's Llama-3.2-3B-Instruct for healthcare and medical question-answering applications. This model demonstrates significant improvements in direct answer capabilities and medical domain understanding through domain-focused fine-tuning.

Model Details

Model Description

MEDFIT-LLM-3B is a 3 billion parameter language model specifically optimized for healthcare chatbot applications. The model was fine-tuned using LoRA (Low-Rank Adaptation) techniques on a carefully curated dataset of healthcare-related questions and answers, resulting in enhanced performance for medical information dissemination and patient education.

  • Developed by: Aditya Karnam Gururaj Rao, Arjun Jaggi, Sonam Naidu
  • Model type: Causal Language Model (Fine-tuned)
  • Language(s): English
  • License: Llama 3.2 Community License
  • Finetuned from model: meta-llama/Llama-3.2-3B-Instruct
  • Fine-tuning method: LoRA (Low-Rank Adaptation)
  • Training framework: MLX

Model Sources

Performance Highlights

Based on comprehensive evaluation against the base Llama-3.2-3B-Instruct model:

  • Direct Answer Improvement: 30 percentage point increase (from 6.0% to 36.0%)
  • Response Structure: 18% increase in numbered list usage for better organization
  • Overall Improvement Score: 108.2 (highest among evaluated models)
  • Response Length: Slight increase (+2.84%) with more comprehensive answers

Uses

Direct Use

MEDFIT-LLM-3B is designed for healthcare chatbot applications where accurate, well-structured medical information delivery is crucial. The model excels at:

  • Medical Question Answering: Providing direct, accurate responses to healthcare queries
  • Patient Education: Delivering structured, easy-to-understand medical information
  • Healthcare Information Dissemination: Supporting healthcare providers with reliable AI assistance
  • Medical Chatbot Applications: Serving as the backbone for healthcare conversational agents

Downstream Use

The model can be integrated into:

  • Healthcare mobile applications
  • Medical information systems
  • Patient support platforms
  • Telemedicine chatbots
  • Medical education tools

Out-of-Scope Use

Important: This model is NOT intended for:

  • Medical diagnosis or treatment recommendations
  • Emergency medical situations
  • Replacement of professional medical advice
  • Clinical decision-making without human oversight
  • Prescription or medication recommendations

Training Details

Training Data

The Dataset is available here: https://huggingface.co/datasets/mlx-community/medfit-dataset

The model was trained on a carefully curated dataset comprising:

  • Total samples: 6,444 unique healthcare-related question-answer pairs
  • Training set: 5,155 samples
  • Validation set: 644 samples
  • Test set: 645 samples

The dataset was created using:

  • Synthetic data generation: 10,000 initial samples generated using Phi-4
  • Domain-specific curation: Healthcare-focused questions derived from existing research
  • Deduplication: Filtered to remove duplicates, resulting in 6,444 unique samples

Training Procedure

Fine-tuning Method

  • Technique: LoRA (Low-Rank Adaptation)
  • Framework: MLX
  • Base model: meta-llama/Llama-3.2-3B-Instruct
  • Focus: Healthcare domain specialization

Training Hyperparameters

  • Fine-tuning approach: Domain-focused LoRA adaptation
  • Dataset split: 80% training, 10% validation, 10% testing
  • Training regime: Optimized for healthcare question-answering performance

Evaluation

Testing Data & Metrics

The model was evaluated on:

  • 50 healthcare-specific validation questions
  • Comparative analysis against base Llama-3.2-3B-Instruct
  • Multi-dimensional assessment including direct answer capability, response structure, and generation efficiency

Key Results

Direct Answer Performance:

  • Base model: 6.0% direct answer rate
  • Fine-tuned model: 36.0% direct answer rate
  • Improvement: +30.0 percentage points

Response Quality:

  • Enhanced structure with increased use of numbered lists (+18%)
  • Improved organization and systematic presentation
  • Better alignment with healthcare communication standards

Generation Efficiency:

  • Slight increase in generation time (+1.6%)
  • Trade-off between response quality and speed
  • Overall positive impact on response comprehensiveness

Bias, Risks, and Limitations

Limitations

  • Not a substitute for professional medical advice
  • May generate plausible-sounding but incorrect medical information
  • Limited to English language medical contexts
  • Training data may not cover all medical specialties equally
  • Performance may vary across different healthcare subdomains

Recommendations

  • Always verify medical information with qualified healthcare professionals
  • Use as a supplementary tool rather than primary medical resource
  • Implement human oversight in all healthcare applications
  • Regular updates needed to maintain medical accuracy as knowledge evolves
  • Consider integration with retrieval-augmented generation (RAG) for enhanced factual accuracy

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("adityak74/medfit-llm-3B")
model = AutoModelForCausalLM.from_pretrained("adityak74/medfit-llm-3B")

# Example usage
prompt = "What are the common symptoms of diabetes?"
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(inputs, max_length=200, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Environmental Impact

The fine-tuning process utilized efficient LoRA techniques to minimize computational requirements while maximizing performance improvements. This approach reduces the environmental impact compared to full model training while achieving significant domain-specific enhancements.

Citation

BibTeX:

@inproceedings{rao2025medfit,
  title={MEDFIT-LLM: Medical Enhancements through Domain-Focused Fine Tuning of Small Language Models},
  author={Rao, Aditya Karnam Gururaj and Jaggi, Arjun and Naidu, Sonam},
  booktitle={2025 2nd International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE)},
  year={2025},
  organization={IEEE}
}

APA: Rao, A. K. G., Jaggi, A., & Naidu, S. (2025). MEDFIT-LLM: Medical Enhancements through Domain-Focused Fine Tuning of Small Language Models. In 2025 2nd International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE). IEEE.

Glossary

  • QA: Question Answering
  • EHR: Electronic Health Record
  • LoRA: Low-Rank Adaptation - an efficient fine-tuning technique
  • MLX: Machine Learning framework used for training
  • Direct Answer Rate: Percentage of responses that begin with direct answers rather than preambles

Model Card Authors

  • Aditya Karnam Gururaj Rao (Zefr Inc, LA, USA)
  • Arjun Jaggi (HCLTech, LA, USA)
  • Sonam Naidu (LexisNexis, USA)

Model Card Contact

Disclaimer

This model is designed for educational and informational purposes in healthcare contexts. It should not be used as a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of qualified healthcare providers with questions regarding medical conditions or treatments.

Downloads last month
128
Safetensors
Model size
3B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for adityak74/medfit-llm-3B

Adapter
(485)
this model