ReFIne: A Framework for Trustworthy Large Reasoning Models with Reliability, Faithfulness, and Interpretability Paper • 2510.09062 • Published Oct 10 • 1
Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents Paper • 2406.18062 • Published Jun 26, 2024 • 1
Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities Paper • 2410.18469 • Published Oct 24, 2024 • 1
Effective Skill Unlearning through Intervention and Abstention Paper • 2503.21730 • Published Mar 27 • 1
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models Paper • 2503.22048 • Published Mar 27 • 2