Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning Paper • 2506.09501 • Published Jun 11 • 18
AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models Paper • 2505.22662 • Published May 28 • 6
MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning Paper • 2505.24846 • Published May 30 • 15
Rethinking Diverse Human Preference Learning through Principal Component Analysis Paper • 2502.13131 • Published Feb 18 • 37
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models Paper • 2411.00836 • Published Oct 29, 2024 • 15
Ray2333/reward-model-Mistral-7B-instruct-Unified-Feedback Text Classification • 7B • Updated Feb 5 • 268 • 11