--- title: CXR-Findings-AI emoji: 🫁 colorFrom: blue colorTo: indigo sdk: gradio app_file: app.py license: mit pinned: true tags: - gradio - pytorch - computer-vision - nlp - multimodal - vision-language - image-to-text - chest-xray - radiology - medical-ai - attention - attention-visualization - interpretability - explainable-ai - xai - cpu-inference - healthcare - demo short_description: Generate chest X-ray findings and explore attention. --- # 🫁 CXR-Findings-AI — Chest X-Ray Findings Generator + Attention Explorer ### **Live Demo (CPU-Only, thanks to Hugging Face Spaces)** 🔗 **[https://huggingface.co/spaces/manu02/CXR-Findings-AI](https://huggingface.co/spaces/manu02/CXR-Findings-AI)** ![App working](assets/app_view.png) --- # 🧠 Overview **CXR-Findings-AI** is an interactive Gradio application that: ### ✅ **Generates radiology findings from a chest X-ray image** ### ✅ **Visualizes multimodal attention (image ↔ text) across layers and heads** ### ✅ **Runs entirely on CPU**, showcasing the efficiency of the underlying 246M-parameter model The system lets researchers, clinicians, and students explore **how different image regions influence each generated word**, enabling deeper interpretability in medical AI. --- # 🔍 What This App Provides ### 🫁 **1. Findings Generation** A lightweight multimodal model produces chest X-ray findings directly from the uploaded image. ### 👁️ **2. Layer-wise & Head-wise Attention Visualization** Inspect how the model distributes attention: * Across **transformer layers** * Across **attention heads** * Between **image tokens** (32×32 grid → 1024 tokens) * And **generated text tokens** ### 🎨 **3. Three Synchronized Views** For each selected word: 1. **Original Image** 2. **Overlay View:** Image + blended attention map 3. **Pure Heatmap:** Visualizes raw attention intensities ### 🧩 **4. Word-Level Interpretability** Click any word in the generated report to reveal its cross-modal attention patterns. --- # 🚀 Quickstart (Local Usage) ### 1) Clone ```bash git clone https://github.com/devMuniz02/Image-Attention-Visualizer cd Image-Attention-Visualizer ``` ### 2) (Optional) Create a virtual environment **Windows:** ```powershell python -m venv venv .\venv\Scripts\Activate.ps1 ``` **macOS / Linux:** ```bash python3 -m venv venv source venv/bin/activate ``` ### 3) Install requirements ```bash pip install -r requirements.txt ``` ### 4) Run the app ```bash python app.py ``` Then open: ``` http://127.0.0.1:7860 ``` --- # 🧭 How to Use the Interface 1. **Upload a chest X-ray** (or load a sample) 2. Adjust: * Max new tokens * Layer selection * Head selection * Or choose *mean* attention across all layers/heads 3. Click **Generate Findings** 4. Click any generated word to visualize: * Image ↔ text attention * Heatmaps * Cross-token relationships --- # 🧩 Repository Structure | File | Description | | -------------------------------- | --------------------------------------------- | | `app.py` | Main Gradio interface and visualization logic | | `utils/models/complete_model.py` | Full multimodal model assembly | | `utils/processing.py` | Image preprocessing | | `assets/` | UI images & examples | | `requirements.txt` | Dependencies | | `README.md` | This file | --- # 🛠️ Troubleshooting * **Blank heatmap** → Ensure `output_attentions=True` in `.generate()` * **Distorted attention** → Check token count = 1024 (32×32) * **Tokenizer errors** → Confirm `model.decoder.tokenizer` is loaded * **OOM on local machine** → Reduce `max_new_tokens` or use CPU-only settings * **Slow inference** → CPU mode is intentionally lightweight; GPU recommended for higher throughput --- # 🧪 Model Integration Notes Compatible with any encoder–decoder or vision–language model that: * Accepts `pixel_values` * Returns attentions when calling ```python model.generate(..., output_attentions=True) ``` * Provides a decoder tokenizer: ```python model.decoder.tokenizer ``` Ideal for research in: * Medical AI * Vision–language alignment * Cross-modal interpretability * Attention visualization * Explainable AI (XAI) --- # ❤️ Acknowledgments * Powered by **Gradio** and **Hugging Face Transformers** * Based on and expanded from the **Token-Attention-Viewer** project 🔗 [https://github.com/devMuniz02/Image-Attention-Visualizer](https://github.com/devMuniz02/Image-Attention-Visualizer) * Created as part of a thesis on **efficient and explainable multimodal medical AI**