LMMs-Lab

community

https://www.lmms-lab.com/

lmmslab

EvolvingLMMs-Lab

Activity Feed

AI & ML interests

Feeling and building the multimodal intelligence.

Recent Activity

pufanyi authored a paper about 11 hours ago

Scaling Spatial Intelligence with Multimodal Foundation Models

caizhongang authored a paper about 11 hours ago

Scaling Spatial Intelligence with Multimodal Foundation Models

yl-1993 authored a paper about 11 hours ago

Scaling Spatial Intelligence with Multimodal Foundation Models

View all activity

lmms-lab 's collections 16

OpenMMReasoner

OpenMMReasoner/OpenMMReasoner-ColdStart

Image-Text-to-Text • 8B • Updated about 9 hours ago • 29 • 3
OpenMMReasoner/OpenMMReasoner-RL

Image-Text-to-Text • 8B • Updated about 9 hours ago • 31 • 5
OpenMMReasoner/OpenMMReasoner-SFT-874K

Viewer • Updated about 9 hours ago • 874k • 38 • 3
OpenMMReasoner/OpenMMReasoner-RL-74K

Viewer • Updated about 9 hours ago • 74.7k • 24 • 3

LLaVA-Critic-R1

lmms-lab/LLaVA-Critic-R1-7B

8B • Updated Jul 19 • 347
lmms-lab/LLaVA-Critic-R1-7B-Plus-Qwen

8B • Updated Jul 26 • 199 • 5
lmms-lab/LLaVA-Critic-R1-7B-Plus-Mimo

8B • Updated Aug 28 • 2
lmms-lab/LLaVA-Critic-R1-7B-LLaMA32v

11B • Updated Aug 28 • 3

Aero-1-Audio

Runtime error

43

Aero 1 Audio Demo

💬

43

Demo for Aero-1-Audio
lmms-lab/Aero-1-Audio

Text Generation • 2B • Updated Jun 7 • 441 • 91

VideoMMMU

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published Jan 23 • 25
lmms-lab/VideoMMMU

Viewer • Updated May 5 • 900 • 4.83k • 10

LLaVA-Critic

as a general evaluator for assessing model performance

LLaVA-Critic: Learning to Evaluate Multimodal Models

Paper • 2410.02712 • Published Oct 3, 2024 • 37
lmms-lab/llava-critic-7b

8B • Updated Oct 4, 2024 • 856 • 15
lmms-lab/llava-critic-72b

73B • Updated Oct 4, 2024 • 357 • 15
lmms-lab/llava-critic-113k

Viewer • Updated Oct 5, 2024 • 113k • 563 • 28

LLaVA-OneVision

a model good at arbitrary types of visual input

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 61
lmms-lab/LLaVA-OneVision-Mid-Data

Viewer • Updated Aug 26, 2024 • 563k • 279 • 21
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24 • 3.94M • 23k • 221
lmms-lab/LLaVA-NeXT-Data

Viewer • Updated Aug 30, 2024 • 779k • 3.17k • 41

LongVA

Long Context Transfer From Text To Vision: https://lmms-lab.github.io/posts/longva/

Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published Jun 24, 2024 • 33
lmms-lab/LongVA-7B

Text Generation • 8B • Updated Jun 26, 2024 • 374 • 15
lmms-lab/LongVA-7B-DPO

Text Generation • 8B • Updated Jun 26, 2024 • 585 • 9
lmms-lab/v_niah_needles

Viewer • Updated Jun 15, 2024 • 5 • 20 • 4

LLaVA-NeXT

Some powerful image models.

lmms-lab/llava-next-110b

Text Generation • 112B • Updated May 14, 2024 • 9 • 21
lmms-lab/llava-next-72b

Text Generation • 73B • Updated Aug 22, 2024 • 75 • 14
lmms-lab/llava-next-qwen-32b

Text Generation • 33B • Updated Jul 16, 2024 • 20.5k • 7
lmms-lab/llama3-llava-next-8b

Text Generation • 8B • Updated Aug 17, 2024 • 6.23k • 103

LLaVA-OneVision-1.5

https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5

mvp-lab/LLaVA-OneVision-1.5-Instruct-Data

Updated about 9 hours ago • 218k • 54
mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M

Viewer • Updated 2 days ago • 90.3M • 241k • 48
lmms-lab/LLaVA-OneVision-1.5-8B-Instruct

Image-Text-to-Text • 9B • Updated Oct 21 • 7.74k • 50
lmms-lab/LLaVA-OneVision-1.5-4B-Instruct

Image-Text-to-Text • 5B • Updated Oct 21 • 4.26k • 11

MMSearch-R1

MMSearch-R1 is a solution designed to train LMMs to perform on-demand multimodal search in real-world environment.

lmms-lab/MMSearch-R1-7B-0807

8B • Updated Aug 7 • 3
lmms-lab/MMSearch-R1-7B

8B • Updated Jul 30 • 42 • 8
lmms-lab/FVQA

Viewer • Updated Aug 9 • 6.66k • 287 • 7
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25 • 64

EgoLife

CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/

EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5 • 46
Runtime error

14

EgoGPT

👁

14

Analyze video to describe actions and transcribe audio
lmms-lab/EgoIT-99K

Viewer • Updated Mar 7 • 199k • 6.05k • 7
lmms-lab/EgoLife

Viewer • Updated Mar 13 • 32k • 27.2k • 15

Multimodal-SAE

The collection of the sae that hooked on llava

Running on Zero

9

Multimodal SAE

💬

9

Demo for Multimodal-SAE
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Paper • 2411.14982 • Published Nov 22, 2024 • 19
lmms-lab/llava-sae-explanations-5k

Viewer • Updated Nov 22, 2024 • 9.8k • 70 • 5
lmms-lab/llama3-llava-next-8b-hf-sae-131k

Updated Nov 26, 2024 • 26 • 7

LLaVA-Video

Models focus on video understanding (previously known as LLaVA-NeXT-Video).

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 39
lmms-lab/LLaVA-Video-178K

Viewer • Updated Oct 11, 2024 • 1.63M • 72.3k • 177
lmms-lab/LLaVA-Video-7B-Qwen2

Video-Text-to-Text • 8B • Updated Oct 25, 2024 • 60.2k • 116
lmms-lab/LLaVA-Video-72B-Qwen2

Text Generation • 73B • Updated Oct 25, 2024 • 1.05k • 20

LMMs-Eval

Dataset Collection of LMMs-Eval

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35
lmms-lab/VQAv2

Viewer • Updated Jan 26, 2024 • 770k • 13.1k • 25
lmms-lab/MME

Viewer • Updated Dec 23, 2023 • 2.37k • 29k • 24
lmms-lab/DocVQA

Viewer • Updated Apr 18, 2024 • 16.6k • 16.7k • 60

LLaVA-Next-Interleave

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10, 2024 • 42
lmms-lab/llava-next-interleave-qwen-7b

Text Generation • 8B • Updated Jul 24, 2024 • 624 • 27
lmms-lab/llava-next-interleave-qwen-7b-dpo

Text Generation • 8B • Updated Jul 12, 2024 • 254 • 12
lmms-lab/M4-Instruct-Data

Updated Jul 21, 2024 • 742 • 75

LMMs-Eval-Lite

Making Lite version of the dataset to accelerate holistic evaluation during model development!

lmms-lab/CMMMU

Viewer • Updated Mar 8, 2024 • 12k • 299 • 4

OpenMMReasoner

OpenMMReasoner/OpenMMReasoner-ColdStart

Image-Text-to-Text • 8B • Updated about 9 hours ago • 29 • 3
OpenMMReasoner/OpenMMReasoner-RL

Image-Text-to-Text • 8B • Updated about 9 hours ago • 31 • 5
OpenMMReasoner/OpenMMReasoner-SFT-874K

Viewer • Updated about 9 hours ago • 874k • 38 • 3
OpenMMReasoner/OpenMMReasoner-RL-74K

Viewer • Updated about 9 hours ago • 74.7k • 24 • 3

LLaVA-OneVision-1.5

https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5

mvp-lab/LLaVA-OneVision-1.5-Instruct-Data

Updated about 9 hours ago • 218k • 54
mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M

Viewer • Updated 2 days ago • 90.3M • 241k • 48
lmms-lab/LLaVA-OneVision-1.5-8B-Instruct

Image-Text-to-Text • 9B • Updated Oct 21 • 7.74k • 50
lmms-lab/LLaVA-OneVision-1.5-4B-Instruct

Image-Text-to-Text • 5B • Updated Oct 21 • 4.26k • 11

LLaVA-Critic-R1

lmms-lab/LLaVA-Critic-R1-7B

8B • Updated Jul 19 • 347
lmms-lab/LLaVA-Critic-R1-7B-Plus-Qwen

8B • Updated Jul 26 • 199 • 5
lmms-lab/LLaVA-Critic-R1-7B-Plus-Mimo

8B • Updated Aug 28 • 2
lmms-lab/LLaVA-Critic-R1-7B-LLaMA32v

11B • Updated Aug 28 • 3

MMSearch-R1

MMSearch-R1 is a solution designed to train LMMs to perform on-demand multimodal search in real-world environment.

lmms-lab/MMSearch-R1-7B-0807

8B • Updated Aug 7 • 3
lmms-lab/MMSearch-R1-7B

8B • Updated Jul 30 • 42 • 8
lmms-lab/FVQA

Viewer • Updated Aug 9 • 6.66k • 287 • 7
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25 • 64

Aero-1-Audio

Runtime error

43

Aero 1 Audio Demo

💬

43

Demo for Aero-1-Audio
lmms-lab/Aero-1-Audio

Text Generation • 2B • Updated Jun 7 • 441 • 91

EgoLife

CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/

EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5 • 46
Runtime error

14

EgoGPT

👁

14

Analyze video to describe actions and transcribe audio
lmms-lab/EgoIT-99K

Viewer • Updated Mar 7 • 199k • 6.05k • 7
lmms-lab/EgoLife

Viewer • Updated Mar 13 • 32k • 27.2k • 15

VideoMMMU

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published Jan 23 • 25
lmms-lab/VideoMMMU

Viewer • Updated May 5 • 900 • 4.83k • 10

Multimodal-SAE

The collection of the sae that hooked on llava

Running on Zero

9

Multimodal SAE

💬

9

Demo for Multimodal-SAE
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Paper • 2411.14982 • Published Nov 22, 2024 • 19
lmms-lab/llava-sae-explanations-5k

Viewer • Updated Nov 22, 2024 • 9.8k • 70 • 5
lmms-lab/llama3-llava-next-8b-hf-sae-131k

Updated Nov 26, 2024 • 26 • 7

LLaVA-Critic

as a general evaluator for assessing model performance

LLaVA-Critic: Learning to Evaluate Multimodal Models

Paper • 2410.02712 • Published Oct 3, 2024 • 37
lmms-lab/llava-critic-7b

8B • Updated Oct 4, 2024 • 856 • 15
lmms-lab/llava-critic-72b

73B • Updated Oct 4, 2024 • 357 • 15
lmms-lab/llava-critic-113k

Viewer • Updated Oct 5, 2024 • 113k • 563 • 28

LLaVA-Video

Models focus on video understanding (previously known as LLaVA-NeXT-Video).

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 39
lmms-lab/LLaVA-Video-178K

Viewer • Updated Oct 11, 2024 • 1.63M • 72.3k • 177
lmms-lab/LLaVA-Video-7B-Qwen2

Video-Text-to-Text • 8B • Updated Oct 25, 2024 • 60.2k • 116
lmms-lab/LLaVA-Video-72B-Qwen2

Text Generation • 73B • Updated Oct 25, 2024 • 1.05k • 20

LLaVA-OneVision

a model good at arbitrary types of visual input

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 61
lmms-lab/LLaVA-OneVision-Mid-Data

Viewer • Updated Aug 26, 2024 • 563k • 279 • 21
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24 • 3.94M • 23k • 221
lmms-lab/LLaVA-NeXT-Data

Viewer • Updated Aug 30, 2024 • 779k • 3.17k • 41

LMMs-Eval

Dataset Collection of LMMs-Eval

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35
lmms-lab/VQAv2

Viewer • Updated Jan 26, 2024 • 770k • 13.1k • 25
lmms-lab/MME

Viewer • Updated Dec 23, 2023 • 2.37k • 29k • 24
lmms-lab/DocVQA

Viewer • Updated Apr 18, 2024 • 16.6k • 16.7k • 60

LongVA

Long Context Transfer From Text To Vision: https://lmms-lab.github.io/posts/longva/

Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published Jun 24, 2024 • 33
lmms-lab/LongVA-7B

Text Generation • 8B • Updated Jun 26, 2024 • 374 • 15
lmms-lab/LongVA-7B-DPO

Text Generation • 8B • Updated Jun 26, 2024 • 585 • 9
lmms-lab/v_niah_needles

Viewer • Updated Jun 15, 2024 • 5 • 20 • 4

LLaVA-Next-Interleave

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10, 2024 • 42
lmms-lab/llava-next-interleave-qwen-7b

Text Generation • 8B • Updated Jul 24, 2024 • 624 • 27
lmms-lab/llava-next-interleave-qwen-7b-dpo

Text Generation • 8B • Updated Jul 12, 2024 • 254 • 12
lmms-lab/M4-Instruct-Data

Updated Jul 21, 2024 • 742 • 75

LLaVA-NeXT

Some powerful image models.

lmms-lab/llava-next-110b

Text Generation • 112B • Updated May 14, 2024 • 9 • 21
lmms-lab/llava-next-72b

Text Generation • 73B • Updated Aug 22, 2024 • 75 • 14
lmms-lab/llava-next-qwen-32b

Text Generation • 33B • Updated Jul 16, 2024 • 20.5k • 7
lmms-lab/llama3-llava-next-8b

Text Generation • 8B • Updated Aug 17, 2024 • 6.23k • 103

LMMs-Eval-Lite

Making Lite version of the dataset to accelerate holistic evaluation during model development!

lmms-lab/CMMMU

Viewer • Updated Mar 8, 2024 • 12k • 299 • 4

AI & ML interests

Recent Activity

Team members 30

lmms-lab 's collections 16

Aero 1 Audio Demo

EgoGPT

Multimodal SAE

Aero 1 Audio Demo

EgoGPT

Multimodal SAE