Update README.md
Browse files
README.md
CHANGED
|
@@ -106,7 +106,15 @@ The Pile was deduplicated before being used to train Pile-T5.
|
|
| 106 |
#### Training procedure
|
| 107 |
|
| 108 |
Pile-T5 was trained with a batch size of approximately 1M tokens
|
| 109 |
-
(2048 sequences of 512 tokens each), for a total of 2,000,000 steps.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
|
| 111 |
### Evaluations
|
| 112 |
|
|
|
|
| 106 |
#### Training procedure
|
| 107 |
|
| 108 |
Pile-T5 was trained with a batch size of approximately 1M tokens
|
| 109 |
+
(2048 sequences of 512 tokens each), for a total of 2,000,000 steps. Pile-T5 was trained
|
| 110 |
+
with the span-corruption objective.
|
| 111 |
+
|
| 112 |
+
#### Training checkpoints
|
| 113 |
+
|
| 114 |
+
Intermediate checkpoints for Pile-T5 are accessible within this repository.
|
| 115 |
+
There are in total 200 checkpoints that are spaced 10,000 steps. For T5x-native
|
| 116 |
+
checkpoints that can be used for finetuning with the T5x library, refer to [here](https://huggingface.co/lintang/pile-t5-base-t5x/tree/main)
|
| 117 |
+
|
| 118 |
|
| 119 |
### Evaluations
|
| 120 |
|