-
OpenMMReasoner/OpenMMReasoner-ColdStart
Image-Text-to-Text • 8B • Updated • 29 • 3 -
OpenMMReasoner/OpenMMReasoner-RL
Image-Text-to-Text • 8B • Updated • 31 • 5 -
OpenMMReasoner/OpenMMReasoner-SFT-874K
Viewer • Updated • 874k • 38 • 3 -
OpenMMReasoner/OpenMMReasoner-RL-74K
Viewer • Updated • 74.7k • 24 • 3
AI & ML interests
Feeling and building the multimodal intelligence.
Recent Activity
View all activity
as a general evaluator for assessing model performance
a model good at arbitrary types of visual input
Long Context Transfer From Text To Vision: https://lmms-lab.github.io/posts/longva/
Some powerful image models.
https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5
-
mvp-lab/LLaVA-OneVision-1.5-Instruct-Data
Updated • 218k • 54 -
mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M
Viewer • Updated • 90.3M • 241k • 48 -
lmms-lab/LLaVA-OneVision-1.5-8B-Instruct
Image-Text-to-Text • 9B • Updated • 7.74k • 50 -
lmms-lab/LLaVA-OneVision-1.5-4B-Instruct
Image-Text-to-Text • 5B • Updated • 4.26k • 11
MMSearch-R1 is a solution designed to train LMMs to perform on-demand multimodal search in real-world environment.
CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/
The collection of the sae that hooked on llava
Models focus on video understanding (previously known as LLaVA-NeXT-Video).
Dataset Collection of LMMs-Eval
-
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Paper • 2407.07895 • Published • 42 -
lmms-lab/llava-next-interleave-qwen-7b
Text Generation • 8B • Updated • 624 • 27 -
lmms-lab/llava-next-interleave-qwen-7b-dpo
Text Generation • 8B • Updated • 254 • 12 -
lmms-lab/M4-Instruct-Data
Updated • 742 • 75
Making Lite version of the dataset to accelerate holistic evaluation during model development!
-
OpenMMReasoner/OpenMMReasoner-ColdStart
Image-Text-to-Text • 8B • Updated • 29 • 3 -
OpenMMReasoner/OpenMMReasoner-RL
Image-Text-to-Text • 8B • Updated • 31 • 5 -
OpenMMReasoner/OpenMMReasoner-SFT-874K
Viewer • Updated • 874k • 38 • 3 -
OpenMMReasoner/OpenMMReasoner-RL-74K
Viewer • Updated • 74.7k • 24 • 3
https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5
-
mvp-lab/LLaVA-OneVision-1.5-Instruct-Data
Updated • 218k • 54 -
mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M
Viewer • Updated • 90.3M • 241k • 48 -
lmms-lab/LLaVA-OneVision-1.5-8B-Instruct
Image-Text-to-Text • 9B • Updated • 7.74k • 50 -
lmms-lab/LLaVA-OneVision-1.5-4B-Instruct
Image-Text-to-Text • 5B • Updated • 4.26k • 11
MMSearch-R1 is a solution designed to train LMMs to perform on-demand multimodal search in real-world environment.
CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/
The collection of the sae that hooked on llava
as a general evaluator for assessing model performance
Models focus on video understanding (previously known as LLaVA-NeXT-Video).
a model good at arbitrary types of visual input
Dataset Collection of LMMs-Eval
Long Context Transfer From Text To Vision: https://lmms-lab.github.io/posts/longva/
-
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Paper • 2407.07895 • Published • 42 -
lmms-lab/llava-next-interleave-qwen-7b
Text Generation • 8B • Updated • 624 • 27 -
lmms-lab/llava-next-interleave-qwen-7b-dpo
Text Generation • 8B • Updated • 254 • 12 -
lmms-lab/M4-Instruct-Data
Updated • 742 • 75
Some powerful image models.
Making Lite version of the dataset to accelerate holistic evaluation during model development!