Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper • 2505.24726 • Published May 30, 2025 • 277
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis Paper • 2506.02096 • Published Jun 2, 2025 • 52
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation Paper • 2506.02397 • Published Jun 3, 2025 • 35
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30, 2025 • 143
Time Blindness: Why Video-Language Models Can't See What Humans Can? Paper • 2505.24867 • Published May 30, 2025 • 80
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? Paper • 2505.23359 • Published May 29, 2025 • 38
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos Paper • 2506.05349 • Published Jun 5, 2025 • 24
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning Paper • 2506.05331 • Published Jun 5, 2025 • 13
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Paper • 2505.15966 • Published May 21, 2025 • 53
InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning Paper • 2505.18291 • Published May 23, 2025 • 2
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought Paper • 2505.16192 • Published May 22, 2025 • 12
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17, 2025 • 77
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Paper • 2507.13158 • Published Jul 17, 2025 • 23
MIRIX: Multi-Agent Memory System for LLM-Based Agents Paper • 2507.07957 • Published Jul 10, 2025 • 79
Zep: A Temporal Knowledge Graph Architecture for Agent Memory Paper • 2501.13956 • Published Jan 20, 2025 • 8
Reasoning with Sampling: Your Base Model is Smarter Than You Think Paper • 2510.14901 • Published Oct 16, 2025 • 47
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning Paper • 2512.07461 • Published 27 days ago • 74
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Paper • 2512.07783 • Published 27 days ago • 36
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation Paper • 2512.05033 • Published about 1 month ago • 15