Model Card: Lekhansh/Llama-3.2-3B-It-EHR-TextsimplificationAndIE
Model Overview
This model is a fine-tuned version of Meta/Llama-3.2-3B-Instruct designed for two core tasks in addiction psychiatry clinical workflows:
- Proofreading and standardizing unstructured clinical notes (CNs) from Electronic Health Records (EHR).
- Extracting structured substance use information, specifically substance presence and last use timing.
The model was developed on CNs from a five-year EHR dataset (2018โ2023), annotated by doctors and nurses for gold-standard benchmarking. It outperformed baseline methods (Jamspell,medSpacy Contextual corrector) and GPT-4-o on both proofreading and information extraction. Human raters were unable to reliably distinguish model-edited notes from human-edited ones and preferred model outputs in a majority of cases. Despite strong overall performance (Mean F1: 0.99), performance on rarer substance classes like hallucinogens remains limited.
Read research: https://osf.io/preprints/osf/d5m6e_v1
Dataset
- Source: 6,500 addiction psychiatry clinical notes from NIMHANS EHR (2018โ2023)
- Annotations: By qualified clinical staff
- Access: Dataset is not publicly available. Researchers may request access after clearance from the NIMHANS data safety board. A small sample is accessible at https://docs.google.com/spreadsheets/d/1JbBlDxFYZCXuvGJL06gDxwzi2GQIYZ-BZ1R6wRX93gI/edit?usp=sharing.
Training Details
- Base model: Meta/Llama-3.2-3B-Instruct 
- Framework: TRL (Transformer Reinforcement Learning) 
- Training examples: 5,563; Validation set: 686 
- Tokens seen: 1,216,042 
- LoRA Adapter Rank: 64 
- Quantisation: 4-bit Quantisation Aware Training (QLoRA) 
- Hyperparameters: - Learning rate: 1e-5
- Scheduler: Cosine
- Epochs: 3
- Batch size: 4
- Gradient Accumulation: 5
 
- Generation during validation: - Temperature: 0.1
- Top-p: 0.95
 
- Hardware: A6000 (48GB), single GPU 
- Training time: ~6 hours 
Evaluation
- Proofreading: - Increased readability: Flesch Kincaid 16 -> 9
- Reduced out-of-vocabulary terms: 5 % -> 1%
- Human evaluation: Only 27.9% identification accuracy (model vs human); 55.7% preference for model output
- Similarity to human corrected: METEOR:0.86, BERT Score: 0.85, Token Level F1: 0.73
 
- Information Extraction: - Mean F1 score: 0.99
- Limitations on rare classes (e.g., hallucinogens)
 
Intended Use
- Suitable for: - Research on standardization and information extraction from EHR clinical notes
- Academic benchmarking and prototyping
 
- Limitations: - Not tested outside addiction psychiatry or Indian EHR data
- Not validated for deployment in clinical decision support
- Should not be used in production settings
 
- Output Format: Trained for structured JSON outputs 
Model Architecture
- Size: 3.2B parameters
- Quantisation: 4-bit
- Adapter Type: LoRA (Rank 64)
Licensing
- License: For academic research use only
- Usage Restriction: Commercial and clinical use is prohibited without explicit permission. Contact author for details.
- Downloads last month
- 2
Model tree for Lekhansh/Llama-3.2-3B-It-EHR-TextsimplificationAndIE
Base model
meta-llama/Llama-3.2-3B-Instruct