π§ Prompt Format (Chat Template)
During Inference, each question is formatted as:
{question} Please reason step by step, and put your final answer within boxed{}.
Then wrapped using the chat template:
prompt = tokenizer.apply_chat_template(
[{{"content": question_with_instruction, "role": "user"}}],
tokenize=False,
add_generation_prompt=True,
)
π§ͺ Example Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("caiyuchen/DAPO-step-27")
tokenizer = AutoTokenizer.from_pretrained("caiyuchen/DAPO-step-27")
question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$"
question_with_instruction = question + "
Please reason step by step, and put your final answer within \boxed{{}}"
# Apply chat template
prompt = tokenizer.apply_chat_template(
[{{"content": question_with_instruction, "role": "user"}}],
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π Reference
If you find this model useful, please consider citing our paper:
π Paper Link: https://huggingface.co/papers/2510.00553
@misc{cai2025predictabilityreinforcementlearningdynamics,
title={On Predictability of Reinforcement Learning Dynamics for Large Language Models},
author={Yuchen Cai and Ding Cao and Xin Xu and Zijun Yao and Yuqing Huang and Zhenyu Tan and Benyi Zhang and Guiquan Liu and Junfeng Fang},
year={2025},
eprint={2510.00553},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2510.00553},
}
- Downloads last month
- 11
Model tree for caiyuchen/DAPO-step-27
Base model
Qwen/Qwen3-8B-Base