Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation Paper • 2510.19592 • Published 4 days ago • 8
ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos Paper • 2407.12987 • Published Jul 17, 2024
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs Paper • 2507.07990 • Published Jul 10 • 45
Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization Paper • 2407.07024 • Published Jul 9, 2024
Detection Recovery in Online Multi-Object Tracking with Sparse Graph Tracker Paper • 2205.00968 • Published May 2, 2022