Not-For-All-Audiences

You can caption files like this (ms-swift formatting):

{"messages": [{"role": "user", "content": "<image>Write a detailed caption for this image, maximum 1000 words. Mention pivotal elements—people, objects, scenery—using confident, definite language. Focus on concrete details like color, shape, and spatial relationships. Use vulgar descriptions of what the characters are doing."}], "images": ["josman_images/97fecadaa022a4bf7a4947edc1f3260a.webp"]}

About the model: The model has been fine tuned on synthetic reasoning data with 500 curated high quality captions, where the reasoning includes contrastive 'fixing its own mistakes'. It was also trained on 9000 image-tag pairs with synthetic chain of thought. The chain of thought was generated by Gemini 3.1 Flash (tags) and Pro (captions). All tags + captions were very high quality.

What is in this repo: Qwen3.5 9B fused with 32 rank LoRA + fully finetuned projector (visual.merger) (fused twice; trained on my captioning v1 rank 32 fused base model)

Downloads last month: 75

Safetensors

Model size

9B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for oldhag88/qwen3.5-9b-nsfw-captioning-v2

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

(99)

this model

Quantizations

2 models