AT^2PO: Agentic Turn-based Policy Optimization via Tree Search Paper • 2601.04767 • Published 2 days ago • 21
Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning? Paper • 2510.06036 • Published Oct 7, 2025 • 6