YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Demystifying Reinforcement Learning in Agentic Reasoning

Paper on arXiv Open-AgentRL on GitHub 30K RL Dataset DemyAgent-4B Model

🎯 About This Repository

This repository contains the Qwen2.5-7B-RA-SFT model weights, a 7B-sized agentic reasoning model that is finetuned with our 3k Agentic SFT dataset, based on Qwen2.5-7B-Instruct.

🌟 Introduction

In our work, we systematically investigate three dimensions of agentic RL: data, algorithms, and reasoning modes. Our findings reveal:

  • 🎯 Data Quality Matters: Real end-to-end trajectories and high-diversity datasets significantly outperform synthetic alternatives
  • ⚑ Training Efficiency: Exploration-friendly techniques like reward clipping and entropy maintenance boost training efficiency
  • 🧠 Reasoning Strategy: Deliberative reasoning with selective tool calls surpasses frequent invocation or verbose self-reasoning We contribute high-quality SFT and RL datasets, demonstrating that simple recipes enable even 4B models to outperform 32B models on the most challenging reasoning benchmarks.

πŸ“¦ Resources

Type Name Link
πŸ“Š Dataset 3K Agentic SFT Data πŸ€— HuggingFace
πŸ“Š Dataset 30K Agentic RL Data πŸ€— HuggingFace
πŸ€– Model Qwen2.5-7B-RA-SFT πŸ€— HuggingFace
πŸ€– Model Qwen3-4B-RA-SFT πŸ€— HuggingFace
πŸ€– Model DemyAgent-4B πŸ€— HuggingFace

πŸ“ Citation

@article{yu2025demystify,
  title={Demystifying Reinforcement Learning in Agentic Reasoning},
  author={Yu, Zhaochen and Yang, Ling and Zou, Jiaru and Yan, Shuicheng and Wang, Mengdi},
  journal={arXiv preprint arXiv:2510.11701},
  year={2025}
}
Downloads last month
14
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including Gen-Verse/Qwen2.5-7B-RA-SFT