Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs Paper • 2506.14731 • Published Jun 17 • 8
Hummer: Towards Limited Competitive Preference Dataset Paper • 2405.11647 • Published May 19, 2024