Unified Reinforcement and Imitation Learning for Vision-Language Models Paper • 2510.19307 • Published 3 days ago • 21
A^2Search: Ambiguity-Aware Question Answering with Reinforcement Learning Paper • 2510.07958 • Published 16 days ago • 4
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts Paper • 2510.19363 • Published 3 days ago • 54
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping Paper • 2510.18927 • Published 4 days ago • 77
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published 3 days ago • 90
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published 4 days ago • 99
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published 15 days ago • 47
Demystifying Reinforcement Learning in Agentic Reasoning Paper • 2510.11701 • Published 12 days ago • 30
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs Paper • 2510.13795 • Published 10 days ago • 49
Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs Paper • 2510.11062 • Published 12 days ago • 25
Generative Universal Verifier as Multimodal Meta-Reasoner Paper • 2510.13804 • Published 10 days ago • 24
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization Paper • 2510.13554 • Published 10 days ago • 54
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth? Paper • 2510.08189 • Published 16 days ago • 25