File size: 4,931 Bytes
0ecca4f 30c4818 c110b10 0ecca4f bf5aa0d 0ecca4f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
---
license: llama3
language: en
library_name: unsloth
tags:
- unsloth
- llama-3.2
- vision-language-model
- ecg
- cardiology
- lora
- medical-imaging
- text-generation
- convaiinnovations
base_model: unsloth/Llama-3.2-11B-Vision-Instruct
datasets:
- ECGInstruct
---
# High-Accuracy ECG Image Interpretation with LLaMA 3.2
This repository contains the official fine-tuned model from the paper: **"High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2"**.
**Paper:** [arXiv:2501.18670](https://arxiv.org/abs/2501.18670)
This model was developed by **Nandakishor M** and **Anjali M** at **Convai Innovations**. It is designed to provide high-accuracy, comprehensive interpretation of electrocardiogram (ECG) images.
## Model Details
* **Base Model:** `unsloth/Llama-3.2-11B-Vision-Instruct`
* **Fine-tuning Strategy:** Parameter-Efficient LoRA
* **Dataset:** `ECGInstruct`, a large-scale dataset with 1 million instruction-following samples derived from public sources like MIMIC-IV ECG and PTB-XL.
* **Primary Use:** Automated analysis and report generation from ECG images to assist cardiologists and medical professionals in diagnosing a wide range of cardiac conditions.
## How to Use
This model was trained using [Unsloth](https://github.com/unslothai/unsloth) to achieve high performance and memory efficiency. The following code provides a complete example of how to load the model in 4-bit precision and run inference.
You can run the code using Free Google Colab at : [](https://colab.research.google.com/drive/1bL9z0NU8kuUyYescSJTIpP9NEkF2Dk6o?usp=sharing)
```python
import torch
from unsloth import FastVisionModel
from transformers import AutoProcessor, TextStreamer
from PIL import Image
from IPython.display import display
# Make sure you have an ECG image file, e.g., 'my_ecg.jpg'
image_path = "my_ecg.jpg"
# Load the 4-bit quantized model and processor
model, processor = FastVisionModel.from_pretrained(
model_name="convaiinnovations/ECG-Instruct-Llama-3.2-11B-Vision",
max_seq_length=4096,
dtype=None,
load_in_4bit=True,
device_map="cuda"
)
# Enable fast inference mode
FastVisionModel.for_inference(model)
# Load the image
image = Image.open(image_path).convert("RGB")
# Define the instruction
query = "You are an expert cardiologist. Write an in-depth diagnosis report from this ECG data, including the final diagnosis."
# Prepare the prompt
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": query}
]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
# Process inputs
inputs = processor(
text=input_text,
images=image,
return_tensors="pt",
).to("cuda")
# Set up streamer for token-by-token output
text_streamer = TextStreamer(processor.tokenizer, skip_prompt=True)
# Generate the report
_ = model.generate(**inputs,
streamer=text_streamer,
max_new_tokens=512,
use_cache=True,
temperature=0.2,
min_p=0.1)
# To see the input image in a notebook:
# display(image.resize((600, 400)))
```
## Training and Fine-tuning
The model was fine-tuned on the `ECGInstruct` dataset using a parameter-efficient LoRA strategy, which significantly improves performance on ECG interpretation tasks while preserving the base model's extensive knowledge.
### Key Hyperparameters:
- **LoRA Rank (`r`):** 64
- **LoRA Alpha (`alpha`):** 128
- **LoRA Dropout:** 0.05
- **Learning Rate:** 2e-4 with a cosine scheduler
- **Epochs:** 3
- **Hardware:** 4x NVIDIA A100 80GB GPUs
- **Framework:** Unsloth with DeepSpeed ZeRO-2
*Note: As described in the paper, the `lm_head` and `embed_tokens` layers were excluded from LoRA adaptation to maintain generation stability.*
## Evaluation
The fine-tuned model demonstrates state-of-the-art performance, significantly outperforming the baseline LLaMA 3.2 model across all metrics.
| Task | Metric | Baseline | **Ours (Fine-tuned)** |
|---------------|-------------|----------|-----------------------|
| Abnorm. Det. | AUC | 0.51 | **0.98** |
| | Macro F1 | 0.33 | **0.74** |
| | Hamming Loss| 0.49 | **0.11** |
| Report Gen. | Report Score| 47.8 | **85.4** |
*Report Score was evaluated using GPT-4o against expert-annotated ground truth reports.*
## Citation
If you use this model in your research, please cite our paper:
```bibtex
@misc{nandakishor2025highaccuracy,
title={High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2},
author={Nandakishor M and Anjali M},
year={2025},
eprint={2501.18670},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
|