---
title: CXR-Findings-AI
emoji: 🫁
colorFrom: blue
colorTo: indigo
sdk: gradio
app_file: app.py
license: mit
pinned: true
tags:
- gradio
- pytorch
- computer-vision
- nlp
- multimodal
- vision-language
- image-to-text
- chest-xray
- radiology
- medical-ai
- attention
- attention-visualization
- interpretability
- explainable-ai
- xai
- cpu-inference
- healthcare
- demo
short_description: Generate chest X-ray findings and explore attention.
---

# 🫁 CXR-Findings-AI — Chest X-Ray Findings Generator + Attention Explorer

### **Live Demo (CPU-Only, thanks to Hugging Face Spaces)**

🔗 **[https://huggingface.co/spaces/manu02/CXR-Findings-AI](https://huggingface.co/spaces/manu02/CXR-Findings-AI)**


![App working](assets/app_view.png)

---

# 🧠 Overview

**CXR-Findings-AI** is an interactive Gradio application that:

### ✅ **Generates radiology findings from a chest X-ray image**

### ✅ **Visualizes multimodal attention (image ↔ text) across layers and heads**

### ✅ **Runs entirely on CPU**, showcasing the efficiency of the underlying 246M-parameter model

The system lets researchers, clinicians, and students explore **how different image regions influence each generated word**, enabling deeper interpretability in medical AI.

---

# 🔍 What This App Provides

### 🫁 **1. Findings Generation**

A lightweight multimodal model produces chest X-ray findings directly from the uploaded image.

### 👁️ **2. Layer-wise & Head-wise Attention Visualization**

Inspect how the model distributes attention:

* Across **transformer layers**
* Across **attention heads**
* Between **image tokens** (32×32 grid → 1024 tokens)
* And **generated text tokens**

### 🎨 **3. Three Synchronized Views**

For each selected word:

1. **Original Image**
2. **Overlay View:** Image + blended attention map
3. **Pure Heatmap:** Visualizes raw attention intensities

### 🧩 **4. Word-Level Interpretability**

Click any word in the generated report to reveal its cross-modal attention patterns.

---

# 🚀 Quickstart (Local Usage)

### 1) Clone

```bash
git clone https://github.com/devMuniz02/Image-Attention-Visualizer
cd Image-Attention-Visualizer
```

### 2) (Optional) Create a virtual environment

**Windows:**

```powershell
python -m venv venv
.\venv\Scripts\Activate.ps1
```

**macOS / Linux:**

```bash
python3 -m venv venv
source venv/bin/activate
```

### 3) Install requirements

```bash
pip install -r requirements.txt
```

### 4) Run the app

```bash
python app.py
```

Then open:

```
http://127.0.0.1:7860
```

---

# 🧭 How to Use the Interface

1. **Upload a chest X-ray** (or load a sample)
2. Adjust:

   * Max new tokens
   * Layer selection
   * Head selection
   * Or choose *mean* attention across all layers/heads
3. Click **Generate Findings**
4. Click any generated word to visualize:

   * Image ↔ text attention
   * Heatmaps
   * Cross-token relationships

---

# 🧩 Repository Structure

| File                             | Description                                   |
| -------------------------------- | --------------------------------------------- |
| `app.py`                         | Main Gradio interface and visualization logic |
| `utils/models/complete_model.py` | Full multimodal model assembly                |
| `utils/processing.py`            | Image preprocessing                           |
| `assets/`                        | UI images & examples                          |
| `requirements.txt`               | Dependencies                                  |
| `README.md`                      | This file                                     |

---

# 🛠️ Troubleshooting

* **Blank heatmap** → Ensure `output_attentions=True` in `.generate()`
* **Distorted attention** → Check token count = 1024 (32×32)
* **Tokenizer errors** → Confirm `model.decoder.tokenizer` is loaded
* **OOM on local machine** → Reduce `max_new_tokens` or use CPU-only settings
* **Slow inference** → CPU mode is intentionally lightweight; GPU recommended for higher throughput

---

# 🧪 Model Integration Notes

Compatible with any encoder–decoder or vision–language model that:

* Accepts `pixel_values`
* Returns attentions when calling

  ```python
  model.generate(..., output_attentions=True)
  ```
* Provides a decoder tokenizer:

  ```python
  model.decoder.tokenizer
  ```

Ideal for research in:

* Medical AI
* Vision–language alignment
* Cross-modal interpretability
* Attention visualization
* Explainable AI (XAI)

---

# ❤️ Acknowledgments

* Powered by **Gradio** and **Hugging Face Transformers**
* Based on and expanded from the **Token-Attention-Viewer** project
🔗 [https://github.com/devMuniz02/Image-Attention-Visualizer](https://github.com/devMuniz02/Image-Attention-Visualizer)
* Created as part of a thesis on **efficient and explainable multimodal medical AI**