Qwen
/

Qwen3-30B-A3B-Thinking-2507

Text Generation

Model card Files Files and versions

feihu.hf commited on Aug 7

Commit

0ac4048

·

1 Parent(s): 4baa2f7

update README

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -339,6 +339,7 @@ We test the model on an 1M version of the [RULER](https://arxiv.org/abs/2404.066
 * All models are evaluated with Dual Chunk Attention enabled.
 * Since the evaluation is time-consuming, we use 260 samples for each length (13 sub-tasks, 20 samples for each).
 ## Best Practices

 * All models are evaluated with Dual Chunk Attention enabled.
 * Since the evaluation is time-consuming, we use 260 samples for each length (13 sub-tasks, 20 samples for each).
+* To avoid overly verbose reasoning, we set the thinking budget to 8,192 tokens.
 ## Best Practices