OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding Paper • 2512.23646 • Published 5 days ago • 14
Nested Browser-Use Learning for Agentic Information Seeking Paper • 2512.23647 • Published 5 days ago • 17
Video-BrowseComp: Benchmarking Agentic Video Research on Open Web Paper • 2512.23044 • Published 5 days ago • 9
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published 8 days ago • 55
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published 5 days ago • 62
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation Paper • 2512.23705 • Published 4 days ago • 41
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published 5 days ago • 86
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents Paper • 2512.22322 • Published 8 days ago • 35
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators Paper • 2512.19682 • Published 11 days ago • 15
UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers Paper • 2511.20123 • Published Nov 25, 2025 • 17
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper • 2511.09611 • Published Nov 12, 2025 • 69
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper • 2511.09611 • Published Nov 12, 2025 • 69 • 3
Demystifying Reinforcement Learning in Agentic Reasoning Paper • 2510.11701 • Published Oct 13, 2025 • 31
Generative Universal Verifier as Multimodal Meta-Reasoner Paper • 2510.13804 • Published Oct 15, 2025 • 25