File size: 4,931 Bytes
0ecca4f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30c4818
c110b10
0ecca4f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bf5aa0d
0ecca4f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---
license: llama3
language: en
library_name: unsloth
tags:
- unsloth
- llama-3.2
- vision-language-model
- ecg
- cardiology
- lora
- medical-imaging
- text-generation
- convaiinnovations
base_model: unsloth/Llama-3.2-11B-Vision-Instruct
datasets:
- ECGInstruct
---

# High-Accuracy ECG Image Interpretation with LLaMA 3.2

This repository contains the official fine-tuned model from the paper: **"High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2"**.

**Paper:** [arXiv:2501.18670](https://arxiv.org/abs/2501.18670)

This model was developed by **Nandakishor M** and **Anjali M** at **Convai Innovations**. It is designed to provide high-accuracy, comprehensive interpretation of electrocardiogram (ECG) images.

## Model Details

* **Base Model:** `unsloth/Llama-3.2-11B-Vision-Instruct`
* **Fine-tuning Strategy:** Parameter-Efficient LoRA
* **Dataset:** `ECGInstruct`, a large-scale dataset with 1 million instruction-following samples derived from public sources like MIMIC-IV ECG and PTB-XL.
* **Primary Use:** Automated analysis and report generation from ECG images to assist cardiologists and medical professionals in diagnosing a wide range of cardiac conditions.

## How to Use

This model was trained using [Unsloth](https://github.com/unslothai/unsloth) to achieve high performance and memory efficiency. The following code provides a complete example of how to load the model in 4-bit precision and run inference.

You can run the code using Free Google Colab at : [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1bL9z0NU8kuUyYescSJTIpP9NEkF2Dk6o?usp=sharing)

```python
import torch
from unsloth import FastVisionModel
from transformers import AutoProcessor, TextStreamer
from PIL import Image
from IPython.display import display

# Make sure you have an ECG image file, e.g., 'my_ecg.jpg'
image_path = "my_ecg.jpg"

# Load the 4-bit quantized model and processor
model, processor = FastVisionModel.from_pretrained(
    model_name="convaiinnovations/ECG-Instruct-Llama-3.2-11B-Vision",
    max_seq_length=4096,
    dtype=None,
    load_in_4bit=True,
    device_map="cuda"
)

# Enable fast inference mode
FastVisionModel.for_inference(model)

# Load the image
image = Image.open(image_path).convert("RGB")

# Define the instruction
query = "You are an expert cardiologist. Write an in-depth diagnosis report from this ECG data, including the final diagnosis."

# Prepare the prompt
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": query}
    ]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)

# Process inputs
inputs = processor(
    text=input_text,
    images=image,
    return_tensors="pt",
).to("cuda")

# Set up streamer for token-by-token output
text_streamer = TextStreamer(processor.tokenizer, skip_prompt=True)

# Generate the report
_ = model.generate(**inputs,
                    streamer=text_streamer,
                    max_new_tokens=512,
                    use_cache=True,
                    temperature=0.2,
                    min_p=0.1)

# To see the input image in a notebook:
# display(image.resize((600, 400)))
```

## Training and Fine-tuning

The model was fine-tuned on the `ECGInstruct` dataset using a parameter-efficient LoRA strategy, which significantly improves performance on ECG interpretation tasks while preserving the base model's extensive knowledge.

### Key Hyperparameters:
- **LoRA Rank (`r`):** 64
- **LoRA Alpha (`alpha`):** 128
- **LoRA Dropout:** 0.05
- **Learning Rate:** 2e-4 with a cosine scheduler
- **Epochs:** 3
- **Hardware:** 4x NVIDIA A100 80GB GPUs
- **Framework:** Unsloth with DeepSpeed ZeRO-2

*Note: As described in the paper, the `lm_head` and `embed_tokens` layers were excluded from LoRA adaptation to maintain generation stability.*

## Evaluation

The fine-tuned model demonstrates state-of-the-art performance, significantly outperforming the baseline LLaMA 3.2 model across all metrics.

| Task          | Metric      | Baseline | **Ours (Fine-tuned)** |
|---------------|-------------|----------|-----------------------|
| Abnorm. Det.  | AUC         | 0.51     | **0.98** |
|               | Macro F1    | 0.33     | **0.74** |
|               | Hamming Loss| 0.49     | **0.11** |
| Report Gen.   | Report Score| 47.8     | **85.4** |

*Report Score was evaluated using GPT-4o against expert-annotated ground truth reports.*

## Citation

If you use this model in your research, please cite our paper:

```bibtex
@misc{nandakishor2025highaccuracy,
      title={High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2},
      author={Nandakishor M and Anjali M},
      year={2025},
      eprint={2501.18670},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```