4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration Paper • 2506.22242 • Published Jun 27
UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding Paper • 2508.11952 • Published Aug 16 • 1
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D Paper • 2503.22976 • Published Mar 29 • 3
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D Paper • 2503.22976 • Published Mar 29 • 3