Commit
·
952e4a1
1
Parent(s):
a1ea5cf
Update README.md
Browse files
README.md
CHANGED
|
@@ -27,21 +27,6 @@ Secondly, a single GPU will most likely not have enough memory to even load the
|
|
| 27 |
- Model parallelism has to be used here to overcome this problem as is explained in this [PR](https://github.com/huggingface/transformers/pull/3578).
|
| 28 |
- DeepSpeed's ZeRO-Offload is another approach as explained in this [post](https://github.com/huggingface/transformers/issues/9996).
|
| 29 |
|
| 30 |
-
---
|
| 31 |
-
language:
|
| 32 |
-
- en
|
| 33 |
-
- fr
|
| 34 |
-
- ro
|
| 35 |
-
- de
|
| 36 |
-
datasets:
|
| 37 |
-
- c4
|
| 38 |
-
tags:
|
| 39 |
-
- summarization
|
| 40 |
-
- translation
|
| 41 |
-
|
| 42 |
-
license: apache-2.0
|
| 43 |
-
---
|
| 44 |
-
|
| 45 |
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
|
| 46 |
|
| 47 |
## PreTraining
|
|
|
|
| 27 |
- Model parallelism has to be used here to overcome this problem as is explained in this [PR](https://github.com/huggingface/transformers/pull/3578).
|
| 28 |
- DeepSpeed's ZeRO-Offload is another approach as explained in this [post](https://github.com/huggingface/transformers/issues/9996).
|
| 29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
|
| 31 |
|
| 32 |
## PreTraining
|