πŸ”§ Prompt Format (Chat Template)

During Inference, each question is formatted as:

{question} Please reason step by step, and put your final answer within boxed{}.

Then wrapped using the chat template:

prompt = tokenizer.apply_chat_template(
    [{{"content": question_with_instruction, "role": "user"}}],
    tokenize=False,
    add_generation_prompt=True,
)

πŸ§ͺ Example Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("caiyuchen/DAPO-step-27")
tokenizer = AutoTokenizer.from_pretrained("caiyuchen/DAPO-step-27")

question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$"
question_with_instruction = question + "
Please reason step by step, and put your final answer within \boxed{{}}"

# Apply chat template
prompt = tokenizer.apply_chat_template(
    [{{"content": question_with_instruction, "role": "user"}}],
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

πŸ“Ž Reference

If you find this model useful, please consider citing our paper:

πŸ”— Paper Link: https://huggingface.co/papers/2510.00553

@misc{cai2025predictabilityreinforcementlearningdynamics,
      title={On Predictability of Reinforcement Learning Dynamics for Large Language Models}, 
      author={Yuchen Cai and Ding Cao and Xin Xu and Zijun Yao and Yuqing Huang and Zhenyu Tan and Benyi Zhang and Guiquan Liu and Junfeng Fang},
      year={2025},
      eprint={2510.00553},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.00553}, 
}
Downloads last month
11
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for caiyuchen/DAPO-step-27

Base model

Qwen/Qwen3-8B-Base
Finetuned
(254)
this model

Dataset used to train caiyuchen/DAPO-step-27