Commit
·
bc2764f
1
Parent(s):
ff46155
fix typos (#6)
Browse files- fix typos (c2a5e573587885ce23744cf330ee7c402f0df16f)
Co-authored-by: George Ogden <[email protected]>
README.md
CHANGED
|
@@ -42,7 +42,7 @@ interests you.
|
|
| 42 |
|
| 43 |
Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
|
| 44 |
to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
|
| 45 |
-
generation you should look at model like GPT2.
|
| 46 |
|
| 47 |
### How to use
|
| 48 |
|
|
@@ -166,14 +166,14 @@ The RoBERTa model was pretrained on the reunion of five datasets:
|
|
| 166 |
- [Stories](https://arxiv.org/abs/1806.02847) a dataset containing a subset of CommonCrawl data filtered to match the
|
| 167 |
story-like style of Winograd schemas.
|
| 168 |
|
| 169 |
-
Together
|
| 170 |
|
| 171 |
## Training procedure
|
| 172 |
|
| 173 |
### Preprocessing
|
| 174 |
|
| 175 |
The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of
|
| 176 |
-
the model take pieces of 512 contiguous
|
| 177 |
with `<s>` and the end of one by `</s>`
|
| 178 |
|
| 179 |
The details of the masking procedure for each sentence are the following:
|
|
|
|
| 42 |
|
| 43 |
Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
|
| 44 |
to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
|
| 45 |
+
generation you should look at a model like GPT2.
|
| 46 |
|
| 47 |
### How to use
|
| 48 |
|
|
|
|
| 166 |
- [Stories](https://arxiv.org/abs/1806.02847) a dataset containing a subset of CommonCrawl data filtered to match the
|
| 167 |
story-like style of Winograd schemas.
|
| 168 |
|
| 169 |
+
Together these datasets weigh 160GB of text.
|
| 170 |
|
| 171 |
## Training procedure
|
| 172 |
|
| 173 |
### Preprocessing
|
| 174 |
|
| 175 |
The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of
|
| 176 |
+
the model take pieces of 512 contiguous tokens that may span over documents. The beginning of a new document is marked
|
| 177 |
with `<s>` and the end of one by `</s>`
|
| 178 |
|
| 179 |
The details of the masking procedure for each sentence are the following:
|