Update README.md
Browse files
README.md
CHANGED
|
@@ -6,7 +6,7 @@ datasets:
|
|
| 6 |
|
| 7 |
# cosmo2-tokenizer
|
| 8 |
Tokenizer for the training of cosmo2. This tokenizer was trained on 1M samples from:
|
| 9 |
-
- FineWeb-Edu
|
| 10 |
- Cosmopedia v2 15%
|
| 11 |
- StarCoderData 8%
|
| 12 |
- OpenWebMath 5%
|
|
|
|
| 6 |
|
| 7 |
# cosmo2-tokenizer
|
| 8 |
Tokenizer for the training of cosmo2. This tokenizer was trained on 1M samples from:
|
| 9 |
+
- FineWeb-Edu 70%
|
| 10 |
- Cosmopedia v2 15%
|
| 11 |
- StarCoderData 8%
|
| 12 |
- OpenWebMath 5%
|