Model Card for mlm-filter-qwen2.5-VL-3B
This model is trained on a scoring dataset and can be used to score English image-text pairs. It supports four dimensions: image_text_matching, object_detail_fulfillment, caption_text_quality, and semantic_understanding. The base model is Qwen2.5 VL-instruct-3B. Since most of the publicly released models by the original authors are based on custom architectures, it is inconvenient to perform inference with vLLM. Therefore, we trained Qwen2.5-VL on the same data to fully support vLLM inference and accelerate inference speed.
该模型基于数据集进行训练,可用于对英语图文对进行评分。它支持四个维度:图文匹配(image_text_matching)、细节符合度(object_detail_fulfillment)、文本质量(caption_text_quality)以及语义理解(semantic_understanding)。基础模型为 Qwen2.5 VL-instruct-3B。由于原作者公开的模型大多基于自定义架构,使用vllm推理不方便,故我们使用相同数据训练了Qwen2.5 VL,以全面的支持vllm推理,以加快推理速度。
The dataset used is weizhiwang/mlm_filter_instructions, and the inference prompt can be referenced from the github link from the original authors MLM-Filter.
使用的数据集为weizhiwang/mlm_filter_instructions,推理prompt可参考原作者的代码MLM-Filter,
- Downloads last month
- 1
Model tree for aiLifeAgain/mlm-filter-qwen2.5-VL-3B
Base model
Qwen/Qwen2.5-VL-3B-Instruct