File size: 3,880 Bytes
b8bdc8c
a283841
b8bdc8c
a283841
b8bdc8c
 
 
 
7a38174
 
 
 
 
 
 
256cbad
7a38174
 
 
 
 
 
 
 
 
a283841
 
 
 
7717955
a283841
 
7717955
e3e98a1
a283841
 
7717955
 
 
 
a283841
 
 
7717955
a283841
7717955
 
 
 
e3e98a1
7717955
e3e98a1
 
7717955
 
e3e98a1
 
 
 
 
 
 
 
 
 
 
 
 
7717955
e3e98a1
 
 
 
 
 
 
 
7717955
e3e98a1
 
 
 
 
 
 
7717955
 
 
 
e3e98a1
 
 
 
 
 
 
 
 
 
 
 
 
7717955
 
 
e3e98a1
7717955
a283841
b8bdc8c
 
 
 
 
 
5bbee9d
b8bdc8c
5bbee9d
 
b8bdc8c
5bbee9d
 
b8bdc8c
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# Asset from the SCALEMED Framework

This model/dataset is an asset released as part of the **SCALEMED** framework, a project focused on developing scalable and resource-efficient medical AI assistants.

## Project Overview

The models, known as **DermatoLlama**, were trained on versions of the **DermaSynth** dataset, which was also generated using the SCALEMED pipeline.

For a complete overview of the project, including all related models, datasets, and the source code, please visit our main Hugging Face organization page: <br>
**[https://huggingface.co/DermaVLM](https://huggingface.co/DermaVLM)** <br>
**[https://github.com/DermaVLM](https://github.com/DermaVLM)** <br>

## Requirements and Our Test System
transformers==4.57.1 <br>
accelerate==1.8.1 <br>
pillow==11.0.0 <br>
peft==0.16.0 <br>
torch==2.7.1+cu126 <br>
torchaudio==2.7.1+cu126 <br>
torchvision==0.22.1+cu126 <br>
python==3.11.13 <br>

CUDA: 12.6 <br>
Driver Version 560.94 <br>
GPU: 1xRTX4090 <br>

## Usage

```python
# %%
from transformers import MllamaForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch
from PIL import Image

# Load base model
base_model_name = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(
    base_model_name, torch_dtype=torch.bfloat16, device_map="auto"
)
processor = AutoProcessor.from_pretrained(base_model_name)

# Load LoRA adapter
adapter_path = "DermaVLM/DermatoLLama-full"
model = PeftModel.from_pretrained(model, adapter_path)
# %%
# Load image using Pillow
image_path = rf"IMAGE_LOCATION"  # Replace with your image path
image = Image.open(image_path)

prompt_text = "Analyze the dermatological condition shown in the image and provide a detailed report including body location."
messages = []
content_list = []

# Add the image to the content
if image:
    content_list.append({"type": "image"})

# Add the text part of the prompt
content_list.append({"type": "text", "text": prompt_text})
messages.append({"role": "user", "content": content_list})

input_text = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=False,
)

# Prepare final inputs with the loaded image
inputs = processor(
    images=image,
    text=input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to(model.device)

generation_config = {
    "max_new_tokens": 512, # be careful with this, it can cause very long inference times
    "do_sample": True,
    "temperature": 0.4,
    "top_p": 0.95,
}

input_length = inputs.input_ids.shape[1]

print(f"Processing image: {image_path}")
print(f"Image size: {image.size}")
print("Generating response...")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        **generation_config,
        pad_token_id=(
            processor.tokenizer.pad_token_id
            if processor.tokenizer.pad_token_id is not None
            else processor.tokenizer.eos_token_id
        ),
    )
    generated_tokens = outputs[0][input_length:]
    raw_output = processor.decode(generated_tokens, skip_special_tokens=True)

print("\n" + "="*50)
print("DERMATOLOGY ANALYSIS:")
print("="*50)
print(raw_output)
print("="*50)
```

## Citation

If you use this model, dataset, or any other asset from our work in your research, we kindly ask that you please cite our preprint:

```bibtex
@article {Yilmaz2025-DermatoLlama-VLM,
	author = {Yilmaz, Abdurrahim and Yuceyalcin, Furkan and Varol, Rahmetullah and Gokyayla, Ece and Erdem, Ozan and Choi, Donghee and Demircali, Ali Anil and Gencoglan, Gulsum and Posma, Joram M. and Temelkuran, Burak},
	title = {Resource-efficient medical vision language model for dermatology via a synthetic data generation framework},
	year = {2025},
	doi = {10.1101/2025.05.17.25327785},
	url = {https://www.medrxiv.org/content/early/2025/07/30/2025.05.17.25327785},
	journal = {medRxiv}
}
```