Open-AgentRL
Collection
Demystifying Reinforcement Learning in Agentic Reasoning
β’
6 items
β’
Updated
β’
2
This repository contains the Qwen2.5-7B-RA-SFT model weights, a 7B-sized agentic reasoning model that is finetuned with our 3k Agentic SFT dataset, based on Qwen2.5-7B-Instruct.
In our work, we systematically investigate three dimensions of agentic RL: data, algorithms, and reasoning modes. Our findings reveal:
| Type | Name | Link |
|---|---|---|
| π Dataset | 3K Agentic SFT Data | π€ HuggingFace |
| π Dataset | 30K Agentic RL Data | π€ HuggingFace |
| π€ Model | Qwen2.5-7B-RA-SFT | π€ HuggingFace |
| π€ Model | Qwen3-4B-RA-SFT | π€ HuggingFace |
| π€ Model | DemyAgent-4B | π€ HuggingFace |
@article{yu2025demystify,
title={Demystifying Reinforcement Learning in Agentic Reasoning},
author={Yu, Zhaochen and Yang, Ling and Zou, Jiaru and Yan, Shuicheng and Wang, Mengdi},
journal={arXiv preprint arXiv:2510.11701},
year={2025}
}