Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published 12 days ago • 110
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published 10 days ago • 99
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published 9 days ago • 172
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation Paper • 2511.01163 • Published 18 days ago • 31
World Simulation with Video Foundation Models for Physical AI Paper • 2511.00062 • Published 23 days ago • 39
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published 14 days ago • 195
LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation Paper • 2511.03001 • Published 16 days ago • 46
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions Paper • 2511.03334 • Published 15 days ago • 50
EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities Paper • 2510.27545 • Published 20 days ago • 47
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published 16 days ago • 54
Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization Paper • 2510.25616 • Published 22 days ago • 90
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published 16 days ago • 100
UniREditBench: A Unified Reasoning-based Image Editing Benchmark Paper • 2511.01295 • Published 17 days ago • 37
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph Paper • 2511.00086 • Published 22 days ago • 41