Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values Paper • 2510.20187 • Published Oct 23 • 18
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values Paper • 2510.20187 • Published Oct 23 • 18
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1 • 75
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper • 2509.00676 • Published Aug 31 • 84
Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding Paper • 2408.08252 • Published Aug 15, 2024 • 1
Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning Paper • 2305.04819 • Published May 8, 2023