Learning to Reason as Action Abstractions with Scalable Mid-Training RL Paper • 2509.25810 • Published Sep 30 • 5 • 2
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning Paper • 2505.20561 • Published May 26 • 7 • 2
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment Paper • 2405.19332 • Published May 29, 2024 • 22 • 1