YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Asset from the SCALEMED Framework

This model/dataset is an asset released as part of the SCALEMED framework, a project focused on developing scalable and resource-efficient medical AI assistants.

Project Overview

The models, known as DermatoLlama, were trained on versions of the DermaSynth dataset, which was also generated using the SCALEMED pipeline.

For a complete overview of the project, including all related models, datasets, and the source code, please visit our main Hugging Face organization page and GitHub repositories:
https://huggingface.co/DermaVLM
https://github.com/DermaVLM

Requirements and Our Test System

transformers==4.57.1
accelerate==1.8.1
pillow==11.0.0
peft==0.16.0
torch==2.7.1+cu126
torchaudio==2.7.1+cu126
torchvision==0.22.1+cu126
python==3.11.13

CUDA: 12.6
Driver Version 560.94
GPU: 1xRTX4090

Usage

# %%
from transformers import MllamaForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch
from PIL import Image

# Load base model
base_model_name = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(
    base_model_name, torch_dtype=torch.bfloat16, device_map="auto"
)
processor = AutoProcessor.from_pretrained(base_model_name)

# Load LoRA adapter
adapter_path = "DermaVLM/DermatoLLama-full"
model = PeftModel.from_pretrained(model, adapter_path)
# %%
# Load image using Pillow
image_path = rf"IMAGE_LOCATION"  # Replace with your image path
image = Image.open(image_path)

prompt_text = "Analyze the dermatological condition shown in the image and provide a detailed report including body location."
messages = []
content_list = []

# Add the image to the content
if image:
    content_list.append({"type": "image"})

# Add the text part of the prompt
content_list.append({"type": "text", "text": prompt_text})
messages.append({"role": "user", "content": content_list})

input_text = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=False,
)

# Prepare final inputs with the loaded image
inputs = processor(
    images=image,
    text=input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to(model.device)

generation_config = {
    "max_new_tokens": 512, # be careful with this, it can cause very long inference times
    "do_sample": True,
    "temperature": 0.4,
    "top_p": 0.95,
}

input_length = inputs.input_ids.shape[1]

print(f"Processing image: {image_path}")
print(f"Image size: {image.size}")
print("Generating response...")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        **generation_config,
        pad_token_id=(
            processor.tokenizer.pad_token_id
            if processor.tokenizer.pad_token_id is not None
            else processor.tokenizer.eos_token_id
        ),
    )
    generated_tokens = outputs[0][input_length:]
    raw_output = processor.decode(generated_tokens, skip_special_tokens=True)

print("\n" + "="*50)
print("DERMATOLOGY ANALYSIS:")
print("="*50)
print(raw_output)
print("="*50)

Citation

If you use this model, dataset, or any other asset from our work in your research, we kindly ask that you please cite our preprint:

@article {Yilmaz2025-DermatoLlama-VLM,
    author = {Yilmaz, Abdurrahim and Yuceyalcin, Furkan and Varol, Rahmetullah and Gokyayla, Ece and Erdem, Ozan and Choi, Donghee and Demircali, Ali Anil and Gencoglan, Gulsum and Posma, Joram M. and Temelkuran, Burak},
    title = {Resource-efficient medical vision language model for dermatology via a synthetic data generation framework},
    year = {2025},
    doi = {10.1101/2025.05.17.25327785},
    url = {https://www.medrxiv.org/content/early/2025/07/30/2025.05.17.25327785},
    journal = {medRxiv}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support