# Asset from the SCALEMED Framework This model/dataset is an asset released as part of the **SCALEMED** framework, a project focused on developing scalable and resource-efficient medical AI assistants. ## Project Overview The models, known as **DermatoLlama**, were trained on versions of the **DermaSynth** dataset, which was also generated using the SCALEMED pipeline. For a complete overview of the project, including all related models, datasets, and the source code, please visit our main Hugging Face organization page:
**[https://huggingface.co/DermaVLM](https://huggingface.co/DermaVLM)**
**[https://github.com/DermaVLM](https://github.com/DermaVLM)**
## Requirements and Our Test System transformers==4.57.1
accelerate==1.8.1
pillow==11.0.0
peft==0.16.0
torch==2.7.1+cu126
torchaudio==2.7.1+cu126
torchvision==0.22.1+cu126
python==3.11.13
CUDA: 12.6
Driver Version 560.94
GPU: 1xRTX4090
## Usage ```python # %% from transformers import MllamaForConditionalGeneration, AutoProcessor from peft import PeftModel import torch from PIL import Image # Load base model base_model_name = "meta-llama/Llama-3.2-11B-Vision-Instruct" model = MllamaForConditionalGeneration.from_pretrained( base_model_name, torch_dtype=torch.bfloat16, device_map="auto" ) processor = AutoProcessor.from_pretrained(base_model_name) # Load LoRA adapter adapter_path = "DermaVLM/DermatoLLama-full" model = PeftModel.from_pretrained(model, adapter_path) # %% # Load image using Pillow image_path = rf"IMAGE_LOCATION" # Replace with your image path image = Image.open(image_path) prompt_text = "Analyze the dermatological condition shown in the image and provide a detailed report including body location." messages = [] content_list = [] # Add the image to the content if image: content_list.append({"type": "image"}) # Add the text part of the prompt content_list.append({"type": "text", "text": prompt_text}) messages.append({"role": "user", "content": content_list}) input_text = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=False, ) # Prepare final inputs with the loaded image inputs = processor( images=image, text=input_text, add_special_tokens=False, return_tensors="pt", ).to(model.device) generation_config = { "max_new_tokens": 512, # be careful with this, it can cause very long inference times "do_sample": True, "temperature": 0.4, "top_p": 0.95, } input_length = inputs.input_ids.shape[1] print(f"Processing image: {image_path}") print(f"Image size: {image.size}") print("Generating response...") with torch.no_grad(): outputs = model.generate( **inputs, **generation_config, pad_token_id=( processor.tokenizer.pad_token_id if processor.tokenizer.pad_token_id is not None else processor.tokenizer.eos_token_id ), ) generated_tokens = outputs[0][input_length:] raw_output = processor.decode(generated_tokens, skip_special_tokens=True) print("\n" + "="*50) print("DERMATOLOGY ANALYSIS:") print("="*50) print(raw_output) print("="*50) ``` ## Citation If you use this model, dataset, or any other asset from our work in your research, we kindly ask that you please cite our preprint: ```bibtex @article {Yilmaz2025-DermatoLlama-VLM, author = {Yilmaz, Abdurrahim and Yuceyalcin, Furkan and Varol, Rahmetullah and Gokyayla, Ece and Erdem, Ozan and Choi, Donghee and Demircali, Ali Anil and Gencoglan, Gulsum and Posma, Joram M. and Temelkuran, Burak}, title = {Resource-efficient medical vision language model for dermatology via a synthetic data generation framework}, year = {2025}, doi = {10.1101/2025.05.17.25327785}, url = {https://www.medrxiv.org/content/early/2025/07/30/2025.05.17.25327785}, journal = {medRxiv} } ```