File size: 3,880 Bytes

# Asset from the SCALEMED Framework

This model/dataset is an asset released as part of the **SCALEMED** framework, a project focused on developing scalable and resource-efficient medical AI assistants.

## Project Overview

The models, known as **DermatoLlama**, were trained on versions of the **DermaSynth** dataset, which was also generated using the SCALEMED pipeline.

For a complete overview of the project, including all related models, datasets, and the source code, please visit our main Hugging Face organization page: <br>
**[https://huggingface.co/DermaVLM](https://huggingface.co/DermaVLM)** <br>
**[https://github.com/DermaVLM](https://github.com/DermaVLM)** <br>

## Requirements and Our Test System
transformers==4.57.1 <br>
accelerate==1.8.1 <br>
pillow==11.0.0 <br>
peft==0.16.0 <br>
torch==2.7.1+cu126 <br>
torchaudio==2.7.1+cu126 <br>
torchvision==0.22.1+cu126 <br>
python==3.11.13 <br>

CUDA: 12.6 <br>
Driver Version 560.94 <br>
GPU: 1xRTX4090 <br>

## Usage

```python
# %%
from transformers import MllamaForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch
from PIL import Image

# Load base model
base_model_name = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(
    base_model_name, torch_dtype=torch.bfloat16, device_map="auto"
)
processor = AutoProcessor.from_pretrained(base_model_name)

# Load LoRA adapter
adapter_path = "DermaVLM/DermatoLLama-full"
model = PeftModel.from_pretrained(model, adapter_path)
# %%
# Load image using Pillow
image_path = rf"IMAGE_LOCATION"  # Replace with your image path
image = Image.open(image_path)

prompt_text = "Analyze the dermatological condition shown in the image and provide a detailed report including body location."
messages = []
content_list = []

# Add the image to the content
if image:
    content_list.append({"type": "image"})

# Add the text part of the prompt
content_list.append({"type": "text", "text": prompt_text})
messages.append({"role": "user", "content": content_list})

input_text = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=False,
)

# Prepare final inputs with the loaded image
inputs = processor(
    images=image,
    text=input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to(model.device)

generation_config = {
    "max_new_tokens": 512, # be careful with this, it can cause very long inference times
    "do_sample": True,
    "temperature": 0.4,
    "top_p": 0.95,
}

input_length = inputs.input_ids.shape[1]

print(f"Processing image: {image_path}")
print(f"Image size: {image.size}")
print("Generating response...")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        **generation_config,
        pad_token_id=(
            processor.tokenizer.pad_token_id
            if processor.tokenizer.pad_token_id is not None
            else processor.tokenizer.eos_token_id
        ),
    )
    generated_tokens = outputs[0][input_length:]
    raw_output = processor.decode(generated_tokens, skip_special_tokens=True)

print("\n" + "="*50)
print("DERMATOLOGY ANALYSIS:")
print("="*50)
print(raw_output)
print("="*50)
```

## Citation

If you use this model, dataset, or any other asset from our work in your research, we kindly ask that you please cite our preprint:

```bibtex
@article {Yilmaz2025-DermatoLlama-VLM,
	author = {Yilmaz, Abdurrahim and Yuceyalcin, Furkan and Varol, Rahmetullah and Gokyayla, Ece and Erdem, Ozan and Choi, Donghee and Demircali, Ali Anil and Gencoglan, Gulsum and Posma, Joram M. and Temelkuran, Burak},
	title = {Resource-efficient medical vision language model for dermatology via a synthetic data generation framework},
	year = {2025},
	doi = {10.1101/2025.05.17.25327785},
	url = {https://www.medrxiv.org/content/early/2025/07/30/2025.05.17.25327785},
	journal = {medRxiv}
}
```