--- license: llama3 language: en library_name: unsloth tags: - unsloth - llama-3.2 - vision-language-model - ecg - cardiology - lora - medical-imaging - text-generation - convaiinnovations base_model: unsloth/Llama-3.2-11B-Vision-Instruct datasets: - ECGInstruct --- # High-Accuracy ECG Image Interpretation with LLaMA 3.2 This repository contains the official fine-tuned model from the paper: **"High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2"**. **Paper:** [arXiv:2501.18670](https://arxiv.org/abs/2501.18670) This model was developed by **Nandakishor M** and **Anjali M** at **Convai Innovations**. It is designed to provide high-accuracy, comprehensive interpretation of electrocardiogram (ECG) images. ## Model Details * **Base Model:** `unsloth/Llama-3.2-11B-Vision-Instruct` * **Fine-tuning Strategy:** Parameter-Efficient LoRA * **Dataset:** `ECGInstruct`, a large-scale dataset with 1 million instruction-following samples derived from public sources like MIMIC-IV ECG and PTB-XL. * **Primary Use:** Automated analysis and report generation from ECG images to assist cardiologists and medical professionals in diagnosing a wide range of cardiac conditions. ## How to Use This model was trained using [Unsloth](https://github.com/unslothai/unsloth) to achieve high performance and memory efficiency. The following code provides a complete example of how to load the model in 4-bit precision and run inference. You can run the code using Free Google Colab at : [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1bL9z0NU8kuUyYescSJTIpP9NEkF2Dk6o?usp=sharing) ```python import torch from unsloth import FastVisionModel from transformers import AutoProcessor, TextStreamer from PIL import Image from IPython.display import display # Make sure you have an ECG image file, e.g., 'my_ecg.jpg' image_path = "my_ecg.jpg" # Load the 4-bit quantized model and processor model, processor = FastVisionModel.from_pretrained( model_name="convaiinnovations/ECG-Instruct-Llama-3.2-11B-Vision", max_seq_length=4096, dtype=None, load_in_4bit=True, device_map="cuda" ) # Enable fast inference mode FastVisionModel.for_inference(model) # Load the image image = Image.open(image_path).convert("RGB") # Define the instruction query = "You are an expert cardiologist. Write an in-depth diagnosis report from this ECG data, including the final diagnosis." # Prepare the prompt messages = [ {"role": "user", "content": [ {"type": "image"}, {"type": "text", "text": query} ]} ] input_text = processor.apply_chat_template(messages, add_generation_prompt=True) # Process inputs inputs = processor( text=input_text, images=image, return_tensors="pt", ).to("cuda") # Set up streamer for token-by-token output text_streamer = TextStreamer(processor.tokenizer, skip_prompt=True) # Generate the report _ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=512, use_cache=True, temperature=0.2, min_p=0.1) # To see the input image in a notebook: # display(image.resize((600, 400))) ``` ## Training and Fine-tuning The model was fine-tuned on the `ECGInstruct` dataset using a parameter-efficient LoRA strategy, which significantly improves performance on ECG interpretation tasks while preserving the base model's extensive knowledge. ### Key Hyperparameters: - **LoRA Rank (`r`):** 64 - **LoRA Alpha (`alpha`):** 128 - **LoRA Dropout:** 0.05 - **Learning Rate:** 2e-4 with a cosine scheduler - **Epochs:** 3 - **Hardware:** 4x NVIDIA A100 80GB GPUs - **Framework:** Unsloth with DeepSpeed ZeRO-2 *Note: As described in the paper, the `lm_head` and `embed_tokens` layers were excluded from LoRA adaptation to maintain generation stability.* ## Evaluation The fine-tuned model demonstrates state-of-the-art performance, significantly outperforming the baseline LLaMA 3.2 model across all metrics. | Task | Metric | Baseline | **Ours (Fine-tuned)** | |---------------|-------------|----------|-----------------------| | Abnorm. Det. | AUC | 0.51 | **0.98** | | | Macro F1 | 0.33 | **0.74** | | | Hamming Loss| 0.49 | **0.11** | | Report Gen. | Report Score| 47.8 | **85.4** | *Report Score was evaluated using GPT-4o against expert-annotated ground truth reports.* ## Citation If you use this model in your research, please cite our paper: ```bibtex @misc{nandakishor2025highaccuracy, title={High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2}, author={Nandakishor M and Anjali M}, year={2025}, eprint={2501.18670}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```