From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Paper • 2510.14979 • Published 8 days ago • 64
SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models Paper • 2510.12784 • Published 10 days ago • 19
LLaVA-OneVision-1.5 Collection https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5 • 9 items • Updated 3 days ago • 16
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published Sep 2 • 121
"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries Paper • 2508.15752 • Published Aug 21 • 7
UI Agent Collection a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics • 423 items • Updated about 4 hours ago • 63
An Illusion of Progress? Assessing the Current State of Web Agents Paper • 2504.01382 • Published Apr 2 • 4
InternVL3.5 Collection This collection includes all released checkpoints of InternVL3.5, covering different training stages (e.g., Pretraining, SFT, MPO, Cascade RL). • 54 items • Updated 26 days ago • 99
MolmoAct Data Mixture Collection All datasets for the MolmoAct (Multimodal Open Language Model for Action) release. • 4 items • Updated Sep 6 • 15
Test-Time Reinforcement Learning for GUI Grounding via Region Consistency Paper • 2508.05615 • Published Aug 7 • 22
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published Aug 7 • 136
Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal Paper • 2508.05988 • Published Aug 8 • 19
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published Aug 8 • 186
Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models Paper • 2507.12566 • Published Jul 16 • 14
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 184
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling Paper • 2507.07982 • Published Jul 10 • 33