File size: 3,880 Bytes
b8bdc8c a283841 b8bdc8c a283841 b8bdc8c 7a38174 256cbad 7a38174 a283841 7717955 a283841 7717955 e3e98a1 a283841 7717955 a283841 7717955 a283841 7717955 e3e98a1 7717955 e3e98a1 7717955 e3e98a1 7717955 e3e98a1 7717955 e3e98a1 7717955 e3e98a1 7717955 e3e98a1 7717955 a283841 b8bdc8c 5bbee9d b8bdc8c 5bbee9d b8bdc8c 5bbee9d b8bdc8c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
# Asset from the SCALEMED Framework
This model/dataset is an asset released as part of the **SCALEMED** framework, a project focused on developing scalable and resource-efficient medical AI assistants.
## Project Overview
The models, known as **DermatoLlama**, were trained on versions of the **DermaSynth** dataset, which was also generated using the SCALEMED pipeline.
For a complete overview of the project, including all related models, datasets, and the source code, please visit our main Hugging Face organization page: <br>
**[https://huggingface.co/DermaVLM](https://huggingface.co/DermaVLM)** <br>
**[https://github.com/DermaVLM](https://github.com/DermaVLM)** <br>
## Requirements and Our Test System
transformers==4.57.1 <br>
accelerate==1.8.1 <br>
pillow==11.0.0 <br>
peft==0.16.0 <br>
torch==2.7.1+cu126 <br>
torchaudio==2.7.1+cu126 <br>
torchvision==0.22.1+cu126 <br>
python==3.11.13 <br>
CUDA: 12.6 <br>
Driver Version 560.94 <br>
GPU: 1xRTX4090 <br>
## Usage
```python
# %%
from transformers import MllamaForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch
from PIL import Image
# Load base model
base_model_name = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(
base_model_name, torch_dtype=torch.bfloat16, device_map="auto"
)
processor = AutoProcessor.from_pretrained(base_model_name)
# Load LoRA adapter
adapter_path = "DermaVLM/DermatoLLama-full"
model = PeftModel.from_pretrained(model, adapter_path)
# %%
# Load image using Pillow
image_path = rf"IMAGE_LOCATION" # Replace with your image path
image = Image.open(image_path)
prompt_text = "Analyze the dermatological condition shown in the image and provide a detailed report including body location."
messages = []
content_list = []
# Add the image to the content
if image:
content_list.append({"type": "image"})
# Add the text part of the prompt
content_list.append({"type": "text", "text": prompt_text})
messages.append({"role": "user", "content": content_list})
input_text = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=False,
)
# Prepare final inputs with the loaded image
inputs = processor(
images=image,
text=input_text,
add_special_tokens=False,
return_tensors="pt",
).to(model.device)
generation_config = {
"max_new_tokens": 512, # be careful with this, it can cause very long inference times
"do_sample": True,
"temperature": 0.4,
"top_p": 0.95,
}
input_length = inputs.input_ids.shape[1]
print(f"Processing image: {image_path}")
print(f"Image size: {image.size}")
print("Generating response...")
with torch.no_grad():
outputs = model.generate(
**inputs,
**generation_config,
pad_token_id=(
processor.tokenizer.pad_token_id
if processor.tokenizer.pad_token_id is not None
else processor.tokenizer.eos_token_id
),
)
generated_tokens = outputs[0][input_length:]
raw_output = processor.decode(generated_tokens, skip_special_tokens=True)
print("\n" + "="*50)
print("DERMATOLOGY ANALYSIS:")
print("="*50)
print(raw_output)
print("="*50)
```
## Citation
If you use this model, dataset, or any other asset from our work in your research, we kindly ask that you please cite our preprint:
```bibtex
@article {Yilmaz2025-DermatoLlama-VLM,
author = {Yilmaz, Abdurrahim and Yuceyalcin, Furkan and Varol, Rahmetullah and Gokyayla, Ece and Erdem, Ozan and Choi, Donghee and Demircali, Ali Anil and Gencoglan, Gulsum and Posma, Joram M. and Temelkuran, Burak},
title = {Resource-efficient medical vision language model for dermatology via a synthetic data generation framework},
year = {2025},
doi = {10.1101/2025.05.17.25327785},
url = {https://www.medrxiv.org/content/early/2025/07/30/2025.05.17.25327785},
journal = {medRxiv}
}
``` |