# Asset from the SCALEMED Framework
This model/dataset is an asset released as part of the **SCALEMED** framework, a project focused on developing scalable and resource-efficient medical AI assistants.
## Project Overview
The models, known as **DermatoLlama**, were trained on versions of the **DermaSynth** dataset, which was also generated using the SCALEMED pipeline.
For a complete overview of the project, including all related models, datasets, and the source code, please visit our main Hugging Face organization page:
**[https://huggingface.co/DermaVLM](https://huggingface.co/DermaVLM)**
**[https://github.com/DermaVLM](https://github.com/DermaVLM)**
## Requirements and Our Test System
transformers==4.57.1
accelerate==1.8.1
pillow==11.0.0
peft==0.16.0
torch==2.7.1+cu126
torchaudio==2.7.1+cu126
torchvision==0.22.1+cu126
python==3.11.13
CUDA: 12.6
Driver Version 560.94
GPU: 1xRTX4090
## Usage
```python
# %%
from transformers import MllamaForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch
from PIL import Image
# Load base model
base_model_name = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(
base_model_name, torch_dtype=torch.bfloat16, device_map="auto"
)
processor = AutoProcessor.from_pretrained(base_model_name)
# Load LoRA adapter
adapter_path = "DermaVLM/DermatoLLama-full"
model = PeftModel.from_pretrained(model, adapter_path)
# %%
# Load image using Pillow
image_path = rf"IMAGE_LOCATION" # Replace with your image path
image = Image.open(image_path)
prompt_text = "Analyze the dermatological condition shown in the image and provide a detailed report including body location."
messages = []
content_list = []
# Add the image to the content
if image:
content_list.append({"type": "image"})
# Add the text part of the prompt
content_list.append({"type": "text", "text": prompt_text})
messages.append({"role": "user", "content": content_list})
input_text = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=False,
)
# Prepare final inputs with the loaded image
inputs = processor(
images=image,
text=input_text,
add_special_tokens=False,
return_tensors="pt",
).to(model.device)
generation_config = {
"max_new_tokens": 512, # be careful with this, it can cause very long inference times
"do_sample": True,
"temperature": 0.4,
"top_p": 0.95,
}
input_length = inputs.input_ids.shape[1]
print(f"Processing image: {image_path}")
print(f"Image size: {image.size}")
print("Generating response...")
with torch.no_grad():
outputs = model.generate(
**inputs,
**generation_config,
pad_token_id=(
processor.tokenizer.pad_token_id
if processor.tokenizer.pad_token_id is not None
else processor.tokenizer.eos_token_id
),
)
generated_tokens = outputs[0][input_length:]
raw_output = processor.decode(generated_tokens, skip_special_tokens=True)
print("\n" + "="*50)
print("DERMATOLOGY ANALYSIS:")
print("="*50)
print(raw_output)
print("="*50)
```
## Citation
If you use this model, dataset, or any other asset from our work in your research, we kindly ask that you please cite our preprint:
```bibtex
@article {Yilmaz2025-DermatoLlama-VLM,
author = {Yilmaz, Abdurrahim and Yuceyalcin, Furkan and Varol, Rahmetullah and Gokyayla, Ece and Erdem, Ozan and Choi, Donghee and Demircali, Ali Anil and Gencoglan, Gulsum and Posma, Joram M. and Temelkuran, Burak},
title = {Resource-efficient medical vision language model for dermatology via a synthetic data generation framework},
year = {2025},
doi = {10.1101/2025.05.17.25327785},
url = {https://www.medrxiv.org/content/early/2025/07/30/2025.05.17.25327785},
journal = {medRxiv}
}
```