Update README.md
Browse files
README.md
CHANGED
|
@@ -6,10 +6,11 @@ language:
|
|
| 6 |
pipeline_tag: text2text-generation
|
| 7 |
tags:
|
| 8 |
- t5x
|
| 9 |
-
-
|
| 10 |
---
|
| 11 |
|
| 12 |
-
Pile-T5 Base is an Encoder-Decoder model trained on [the Pile](https://pile.eleuther.ai/) using the [T5x](https://github.com/google-research/t5x) library. The model was trained for 2 million steps or roughly 2 trillion tokens using MLM-objective similar to the original T5 model.
|
|
|
|
| 13 |
|
| 14 |
### Model Details
|
| 15 |
|
|
@@ -30,7 +31,7 @@ ai](mailto:[email protected]).
|
|
| 30 |
|
| 31 |
| Hyperparameter | Value |
|
| 32 |
| -------------------------- | ----------- |
|
| 33 |
-
| n<sub>parameters</sub> |
|
| 34 |
| n<sub>encoder layers</sub> | 12 |
|
| 35 |
| n<sub>decoder layers</sub> | 12 |
|
| 36 |
| d<sub>model</sub> | 2048 |
|
|
@@ -133,16 +134,18 @@ checkpoints that can be used for finetuning with the T5x library, refer to [here
|
|
| 133 |
|
| 134 |
### Evaluations
|
| 135 |
|
| 136 |
-
|
|
|
|
| 137 |
|
| 138 |
### BibTeX
|
| 139 |
|
| 140 |
```
|
| 141 |
-
@
|
| 142 |
author = {Lintang Sutawika and Aran Komatsuzaki and Colin Raffel},
|
| 143 |
-
title = {Pile
|
| 144 |
year = {2024},
|
| 145 |
-
url = {}
|
|
|
|
| 146 |
}
|
| 147 |
```
|
| 148 |
|
|
|
|
| 6 |
pipeline_tag: text2text-generation
|
| 7 |
tags:
|
| 8 |
- t5x
|
| 9 |
+
- encoder-decoder
|
| 10 |
---
|
| 11 |
|
| 12 |
+
Pile-T5 Base is an Encoder-Decoder model trained on [the Pile](https://pile.eleuther.ai/) with using the [T5x](https://github.com/google-research/t5x) library. The model was trained for 2 million steps or roughly 2 trillion tokens using MLM-objective similar to the original T5 model.
|
| 13 |
+
The HF version of Pile-T5 Base borrows UMT5's model implementation as it uses scalable model implementation from T5x and uses `LlamaTokenizer`.
|
| 14 |
|
| 15 |
### Model Details
|
| 16 |
|
|
|
|
| 31 |
|
| 32 |
| Hyperparameter | Value |
|
| 33 |
| -------------------------- | ----------- |
|
| 34 |
+
| n<sub>parameters</sub> | 247586304 |
|
| 35 |
| n<sub>encoder layers</sub> | 12 |
|
| 36 |
| n<sub>decoder layers</sub> | 12 |
|
| 37 |
| d<sub>model</sub> | 2048 |
|
|
|
|
| 134 |
|
| 135 |
### Evaluations
|
| 136 |
|
| 137 |
+
Pile-T5 Base was evaluated on SuperGLUE, CodeXGLUE. A Flan-finetuned version was evaluated on Flan Held In tasks.
|
| 138 |
+
Results can be seen in the [blogpost](https://blog.eleuther.ai/pile-t5/)
|
| 139 |
|
| 140 |
### BibTeX
|
| 141 |
|
| 142 |
```
|
| 143 |
+
@misc{2024PileT5,
|
| 144 |
author = {Lintang Sutawika and Aran Komatsuzaki and Colin Raffel},
|
| 145 |
+
title = {Pile-T5},
|
| 146 |
year = {2024},
|
| 147 |
+
url = {https://blog.eleuther.ai/pile-t5/},
|
| 148 |
+
note = {Blog post},
|
| 149 |
}
|
| 150 |
```
|
| 151 |
|