File size: 4,931 Bytes

---
license: llama3
language: en
library_name: unsloth
tags:
- unsloth
- llama-3.2
- vision-language-model
- ecg
- cardiology
- lora
- medical-imaging
- text-generation
- convaiinnovations
base_model: unsloth/Llama-3.2-11B-Vision-Instruct
datasets:
- ECGInstruct
---

# High-Accuracy ECG Image Interpretation with LLaMA 3.2

This repository contains the official fine-tuned model from the paper: **"High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2"**.

**Paper:** [arXiv:2501.18670](https://arxiv.org/abs/2501.18670)

This model was developed by **Nandakishor M** and **Anjali M** at **Convai Innovations**. It is designed to provide high-accuracy, comprehensive interpretation of electrocardiogram (ECG) images.

## Model Details

* **Base Model:** `unsloth/Llama-3.2-11B-Vision-Instruct`
* **Fine-tuning Strategy:** Parameter-Efficient LoRA
* **Dataset:** `ECGInstruct`, a large-scale dataset with 1 million instruction-following samples derived from public sources like MIMIC-IV ECG and PTB-XL.
* **Primary Use:** Automated analysis and report generation from ECG images to assist cardiologists and medical professionals in diagnosing a wide range of cardiac conditions.

## How to Use

This model was trained using [Unsloth](https://github.com/unslothai/unsloth) to achieve high performance and memory efficiency. The following code provides a complete example of how to load the model in 4-bit precision and run inference.

You can run the code using Free Google Colab at : [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1bL9z0NU8kuUyYescSJTIpP9NEkF2Dk6o?usp=sharing)

```python
import torch
from unsloth import FastVisionModel
from transformers import AutoProcessor, TextStreamer
from PIL import Image
from IPython.display import display

# Make sure you have an ECG image file, e.g., 'my_ecg.jpg'
image_path = "my_ecg.jpg"

# Load the 4-bit quantized model and processor
model, processor = FastVisionModel.from_pretrained(
    model_name="convaiinnovations/ECG-Instruct-Llama-3.2-11B-Vision",
    max_seq_length=4096,
    dtype=None,
    load_in_4bit=True,
    device_map="cuda"
)

# Enable fast inference mode
FastVisionModel.for_inference(model)

# Load the image
image = Image.open(image_path).convert("RGB")

# Define the instruction
query = "You are an expert cardiologist. Write an in-depth diagnosis report from this ECG data, including the final diagnosis."

# Prepare the prompt
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": query}
    ]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)

# Process inputs
inputs = processor(
    text=input_text,
    images=image,
    return_tensors="pt",
).to("cuda")

# Set up streamer for token-by-token output
text_streamer = TextStreamer(processor.tokenizer, skip_prompt=True)

# Generate the report
_ = model.generate(**inputs,
                    streamer=text_streamer,
                    max_new_tokens=512,
                    use_cache=True,
                    temperature=0.2,
                    min_p=0.1)

# To see the input image in a notebook:
# display(image.resize((600, 400)))
```

## Training and Fine-tuning

The model was fine-tuned on the `ECGInstruct` dataset using a parameter-efficient LoRA strategy, which significantly improves performance on ECG interpretation tasks while preserving the base model's extensive knowledge.

### Key Hyperparameters:
- **LoRA Rank (`r`):** 64
- **LoRA Alpha (`alpha`):** 128
- **LoRA Dropout:** 0.05
- **Learning Rate:** 2e-4 with a cosine scheduler
- **Epochs:** 3
- **Hardware:** 4x NVIDIA A100 80GB GPUs
- **Framework:** Unsloth with DeepSpeed ZeRO-2

*Note: As described in the paper, the `lm_head` and `embed_tokens` layers were excluded from LoRA adaptation to maintain generation stability.*

## Evaluation

The fine-tuned model demonstrates state-of-the-art performance, significantly outperforming the baseline LLaMA 3.2 model across all metrics.

| Task          | Metric      | Baseline | **Ours (Fine-tuned)** |
|---------------|-------------|----------|-----------------------|
| Abnorm. Det.  | AUC         | 0.51     | **0.98** |
|               | Macro F1    | 0.33     | **0.74** |
|               | Hamming Loss| 0.49     | **0.11** |
| Report Gen.   | Report Score| 47.8     | **85.4** |

*Report Score was evaluated using GPT-4o against expert-annotated ground truth reports.*

## Citation

If you use this model in your research, please cite our paper:

```bibtex
@misc{nandakishor2025highaccuracy,
      title={High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2},
      author={Nandakishor M and Anjali M},
      year={2025},
      eprint={2501.18670},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```