The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2 • 217
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2 • 83
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion Paper • 2509.01215 • Published Sep 1 • 50
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper • 2509.00676 • Published Aug 31 • 83
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published Sep 2 • 121
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1 • 71
Baichuan-M2: Scaling Medical Capability with Large Verifier System Paper • 2509.02208 • Published Sep 2 • 40
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR Paper • 2509.02522 • Published Sep 2 • 25
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic Paper • 2509.01363 • Published Sep 1 • 57
Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published Sep 2 • 25
GenCompositor: Generative Video Compositing with Diffusion Transformer Paper • 2509.02460 • Published Sep 2 • 25
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning Paper • 2509.01644 • Published Sep 1 • 33
Attributes as Textual Genes: Leveraging LLMs as Genetic Algorithm Simulators for Conditional Synthetic Data Generation Paper • 2509.02040 • Published Sep 2 • 14
M3Ret: Unleashing Zero-shot Multimodal Medical Image Retrieval via Self-Supervision Paper • 2509.01360 • Published Sep 1 • 11
FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games Paper • 2509.01052 • Published Sep 1 • 20
Universal Deep Research: Bring Your Own Model and Strategy Paper • 2509.00244 • Published Aug 29 • 13
Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing Paper • 2509.01984 • Published Sep 2 • 6
MedDINOv3: How to adapt vision foundation models for medical image segmentation? Paper • 2509.02379 • Published Sep 2 • 1
Improving Large Vision and Language Models by Learning from a Panel of Peers Paper • 2509.01610 • Published Sep 1 • 2
Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views Paper • 2509.01250 • Published Sep 1 • 2
SQL-of-Thought: Multi-agentic Text-to-SQL with Guided Error Correction Paper • 2509.00581 • Published Aug 30 • 9
C-DiffDet+: Fusing Global Scene Context with Generative Denoising for High-Fidelity Object Detection Paper • 2509.00578 • Published Aug 30 • 1
Metis: Training Large Language Models with Advanced Low-Bit Quantization Paper • 2509.00404 • Published Aug 30 • 5
FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable Diffusion Models Paper • 2508.20586 • Published Aug 28 • 3