LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning Paper • 2509.24786 • Published Sep 29 • 5
ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding Paper • 2504.18152 • Published Apr 25
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context Paper • 2506.21277 • Published Jun 26 • 15
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding Paper • 2501.15111 • Published Jan 25 • 1
ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation Paper • 2308.09242 • Published Aug 18, 2023
Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models Paper • 2410.19635 • Published Oct 25, 2024
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models Paper • 2501.18954 • Published Jan 31