MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge Paper • 2507.21183 • Published Jul 27 • 14
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE Paper • 2507.21802 • Published Jul 29 • 15
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity Paper • 2507.21848 • Published Jul 29 • 8
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published Aug 22 • 154
Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published Sep 4 • 73
Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting Paper • 2509.11452 • Published Sep 14 • 13