Back to Basics: Let Denoising Generative Models Denoise Paper • 2511.13720 • Published 13 days ago • 59
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective Paper • 2509.18905 • Published Sep 23 • 29
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts Paper • 2507.20939 • Published Jul 28 • 56
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics Paper • 2506.04308 • Published Jun 4 • 43
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion Paper • 2503.22262 • Published Mar 28 • 1