Welcome to EmoCaliber, an MLLM for reliable visual emotion comprehension.
Paper: EmoCaliber: Advancing Reliable Visual Emotion Comprehension via Confidence Verbalization and Calibration Code / Project Page: https://github.com/wdqqdw/EmoCaliber
Given an image, EmoCaliber is trained to produce structured affective reasoning following this pipeline: (1) identifying prominent visual elements in the image; (2) providing detailed descriptions of human subjects, if present; (3) describing contextual elements beyond the subjects; (4) discussing how these elements interact; and (5) deriving an emotional conclusion based on the preceding observations. The final emotion prediction integrates these visual cues. After outputting the prediction, EmoCaliber also emits a confidence score wrapped in a <confidence> tag, which reflects the model’s self-assessed certainty about its answer.
EmoCaliber is implemented based on Qwen2.5-VL-7B and can perform both inference and training in an identical manner.
Standard prompt templates:
For emotion recognition:
{
"conversations": [
{
"role": "user",
"content": [
{"type": "image", "image": "IMAGE_PATH"},
{
"type": "text",
"text": "Which emotion might this image evoke? Choose the most likely one from ['EMOTION_CATEGORIES']. Think step by step. Respond in the format: <think>{your reasoning}</think><answer>{your final answer}</answer>."
}
]
}
]
}
For sentiment analysis:
{
"conversations": [
{
"role": "user",
"content": [
{"type": "image", "image": "IMAGE_PATH"},
{
"type": "text",
"text": "What sentiment might this image evoke? Choose the most likely one from ['positive', 'negative']. Think step by step. Respond in the format: <think>{your reasoning}</think><answer>{your final answer}</answer>."
}
]
}
]
}
- Downloads last month
- 26