🔧 Prompt Format (Chat Template)

During Inference, each question is formatted as:

{question} Please reason step by step, and put your final answer within boxed{}.

Then wrapped using the chat template:

prompt = tokenizer.apply_chat_template(
    [{{"content": question_with_instruction, "role": "user"}}],
    tokenize=False,
    add_generation_prompt=True,
)

🧪 Example Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("caiyuchen/DAPO-step-27")
tokenizer = AutoTokenizer.from_pretrained("caiyuchen/DAPO-step-27")

question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$"
question_with_instruction = question + "
Please reason step by step, and put your final answer within \boxed{{}}"

# Apply chat template
prompt = tokenizer.apply_chat_template(
    [{{"content": question_with_instruction, "role": "user"}}],
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📎 Reference

If you find this model useful, please consider citing our paper:

🔗 Paper Link: https://huggingface.co/papers/2510.00553

@misc{cai2025predictabilityreinforcementlearningdynamics,
      title={On Predictability of Reinforcement Learning Dynamics for Large Language Models}, 
      author={Yuchen Cai and Ding Cao and Xin Xu and Zijun Yao and Yuqing Huang and Zhenyu Tan and Benyi Zhang and Guiquan Liu and Junfeng Fang},
      year={2025},
      eprint={2510.00553},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.00553}, 
}

Downloads last month: 11

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for caiyuchen/DAPO-step-27

Base model

Qwen/Qwen3-8B-Base

Finetuned

(254)

this model

caiyuchen
/

DAPO-step-27

🔧 Prompt Format (Chat Template)

🧪 Example Usage

📎 Reference

Model tree for caiyuchen/DAPO-step-27

Dataset used to train caiyuchen/DAPO-step-27