feihu.hf
commited on
Commit
·
0ac4048
1
Parent(s):
4baa2f7
update README
Browse files
README.md
CHANGED
|
@@ -339,6 +339,7 @@ We test the model on an 1M version of the [RULER](https://arxiv.org/abs/2404.066
|
|
| 339 |
|
| 340 |
* All models are evaluated with Dual Chunk Attention enabled.
|
| 341 |
* Since the evaluation is time-consuming, we use 260 samples for each length (13 sub-tasks, 20 samples for each).
|
|
|
|
| 342 |
|
| 343 |
## Best Practices
|
| 344 |
|
|
|
|
| 339 |
|
| 340 |
* All models are evaluated with Dual Chunk Attention enabled.
|
| 341 |
* Since the evaluation is time-consuming, we use 260 samples for each length (13 sub-tasks, 20 samples for each).
|
| 342 |
+
* To avoid overly verbose reasoning, we set the thinking budget to 8,192 tokens.
|
| 343 |
|
| 344 |
## Best Practices
|
| 345 |
|