Jendersen commited on
Commit
8a37fa0
·
verified ·
1 Parent(s): 496f40e

Upload translation_mt5_k.ipynb

Browse files
Files changed (1) hide show
  1. translation_mt5_k.ipynb +1778 -0
translation_mt5_k.ipynb ADDED
@@ -0,0 +1,1778 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "nbformat": 4,
3
+ "nbformat_minor": 0,
4
+ "metadata": {
5
+ "colab": {
6
+ "provenance": [],
7
+ "gpuType": "T4"
8
+ },
9
+ "kernelspec": {
10
+ "name": "python3",
11
+ "display_name": "Python 3"
12
+ },
13
+ "language_info": {
14
+ "name": "python"
15
+ },
16
+ "accelerator": "GPU",
17
+ "widgets": {
18
+ "application/vnd.jupyter.widget-state+json": {
19
+ "8a49fb7918744d23bf64237c2674fc6b": {
20
+ "model_module": "@jupyter-widgets/controls",
21
+ "model_name": "HBoxModel",
22
+ "model_module_version": "1.5.0",
23
+ "state": {
24
+ "_dom_classes": [],
25
+ "_model_module": "@jupyter-widgets/controls",
26
+ "_model_module_version": "1.5.0",
27
+ "_model_name": "HBoxModel",
28
+ "_view_count": null,
29
+ "_view_module": "@jupyter-widgets/controls",
30
+ "_view_module_version": "1.5.0",
31
+ "_view_name": "HBoxView",
32
+ "box_style": "",
33
+ "children": [
34
+ "IPY_MODEL_7a64e72d88084b6897d53c344aad4358",
35
+ "IPY_MODEL_85b34cfea65c4a8dbdae30a8eec22a98",
36
+ "IPY_MODEL_500b7fb2607c4eebbb5bea0bfc4e33ff"
37
+ ],
38
+ "layout": "IPY_MODEL_b53f4a68898e45098d6801e4987e4c10"
39
+ }
40
+ },
41
+ "7a64e72d88084b6897d53c344aad4358": {
42
+ "model_module": "@jupyter-widgets/controls",
43
+ "model_name": "HTMLModel",
44
+ "model_module_version": "1.5.0",
45
+ "state": {
46
+ "_dom_classes": [],
47
+ "_model_module": "@jupyter-widgets/controls",
48
+ "_model_module_version": "1.5.0",
49
+ "_model_name": "HTMLModel",
50
+ "_view_count": null,
51
+ "_view_module": "@jupyter-widgets/controls",
52
+ "_view_module_version": "1.5.0",
53
+ "_view_name": "HTMLView",
54
+ "description": "",
55
+ "description_tooltip": null,
56
+ "layout": "IPY_MODEL_008567f59c13464b82ca3946ccf0058c",
57
+ "placeholder": "​",
58
+ "style": "IPY_MODEL_6ac91c5649bc4b6fa0d3860e71cdb01b",
59
+ "value": "Map: 100%"
60
+ }
61
+ },
62
+ "85b34cfea65c4a8dbdae30a8eec22a98": {
63
+ "model_module": "@jupyter-widgets/controls",
64
+ "model_name": "FloatProgressModel",
65
+ "model_module_version": "1.5.0",
66
+ "state": {
67
+ "_dom_classes": [],
68
+ "_model_module": "@jupyter-widgets/controls",
69
+ "_model_module_version": "1.5.0",
70
+ "_model_name": "FloatProgressModel",
71
+ "_view_count": null,
72
+ "_view_module": "@jupyter-widgets/controls",
73
+ "_view_module_version": "1.5.0",
74
+ "_view_name": "ProgressView",
75
+ "bar_style": "success",
76
+ "description": "",
77
+ "description_tooltip": null,
78
+ "layout": "IPY_MODEL_90bfdbd03ac44d5ca3df23ebcc02d863",
79
+ "max": 74586,
80
+ "min": 0,
81
+ "orientation": "horizontal",
82
+ "style": "IPY_MODEL_6618b65302df46098bd2ad029ed668eb",
83
+ "value": 74586
84
+ }
85
+ },
86
+ "500b7fb2607c4eebbb5bea0bfc4e33ff": {
87
+ "model_module": "@jupyter-widgets/controls",
88
+ "model_name": "HTMLModel",
89
+ "model_module_version": "1.5.0",
90
+ "state": {
91
+ "_dom_classes": [],
92
+ "_model_module": "@jupyter-widgets/controls",
93
+ "_model_module_version": "1.5.0",
94
+ "_model_name": "HTMLModel",
95
+ "_view_count": null,
96
+ "_view_module": "@jupyter-widgets/controls",
97
+ "_view_module_version": "1.5.0",
98
+ "_view_name": "HTMLView",
99
+ "description": "",
100
+ "description_tooltip": null,
101
+ "layout": "IPY_MODEL_a04d20d2b4934f35a7890f24546023d7",
102
+ "placeholder": "​",
103
+ "style": "IPY_MODEL_5f5df2985c6f49d3816888ec4ebd9add",
104
+ "value": " 74586/74586 [00:44<00:00, 1302.29 examples/s]"
105
+ }
106
+ },
107
+ "b53f4a68898e45098d6801e4987e4c10": {
108
+ "model_module": "@jupyter-widgets/base",
109
+ "model_name": "LayoutModel",
110
+ "model_module_version": "1.2.0",
111
+ "state": {
112
+ "_model_module": "@jupyter-widgets/base",
113
+ "_model_module_version": "1.2.0",
114
+ "_model_name": "LayoutModel",
115
+ "_view_count": null,
116
+ "_view_module": "@jupyter-widgets/base",
117
+ "_view_module_version": "1.2.0",
118
+ "_view_name": "LayoutView",
119
+ "align_content": null,
120
+ "align_items": null,
121
+ "align_self": null,
122
+ "border": null,
123
+ "bottom": null,
124
+ "display": null,
125
+ "flex": null,
126
+ "flex_flow": null,
127
+ "grid_area": null,
128
+ "grid_auto_columns": null,
129
+ "grid_auto_flow": null,
130
+ "grid_auto_rows": null,
131
+ "grid_column": null,
132
+ "grid_gap": null,
133
+ "grid_row": null,
134
+ "grid_template_areas": null,
135
+ "grid_template_columns": null,
136
+ "grid_template_rows": null,
137
+ "height": null,
138
+ "justify_content": null,
139
+ "justify_items": null,
140
+ "left": null,
141
+ "margin": null,
142
+ "max_height": null,
143
+ "max_width": null,
144
+ "min_height": null,
145
+ "min_width": null,
146
+ "object_fit": null,
147
+ "object_position": null,
148
+ "order": null,
149
+ "overflow": null,
150
+ "overflow_x": null,
151
+ "overflow_y": null,
152
+ "padding": null,
153
+ "right": null,
154
+ "top": null,
155
+ "visibility": null,
156
+ "width": null
157
+ }
158
+ },
159
+ "008567f59c13464b82ca3946ccf0058c": {
160
+ "model_module": "@jupyter-widgets/base",
161
+ "model_name": "LayoutModel",
162
+ "model_module_version": "1.2.0",
163
+ "state": {
164
+ "_model_module": "@jupyter-widgets/base",
165
+ "_model_module_version": "1.2.0",
166
+ "_model_name": "LayoutModel",
167
+ "_view_count": null,
168
+ "_view_module": "@jupyter-widgets/base",
169
+ "_view_module_version": "1.2.0",
170
+ "_view_name": "LayoutView",
171
+ "align_content": null,
172
+ "align_items": null,
173
+ "align_self": null,
174
+ "border": null,
175
+ "bottom": null,
176
+ "display": null,
177
+ "flex": null,
178
+ "flex_flow": null,
179
+ "grid_area": null,
180
+ "grid_auto_columns": null,
181
+ "grid_auto_flow": null,
182
+ "grid_auto_rows": null,
183
+ "grid_column": null,
184
+ "grid_gap": null,
185
+ "grid_row": null,
186
+ "grid_template_areas": null,
187
+ "grid_template_columns": null,
188
+ "grid_template_rows": null,
189
+ "height": null,
190
+ "justify_content": null,
191
+ "justify_items": null,
192
+ "left": null,
193
+ "margin": null,
194
+ "max_height": null,
195
+ "max_width": null,
196
+ "min_height": null,
197
+ "min_width": null,
198
+ "object_fit": null,
199
+ "object_position": null,
200
+ "order": null,
201
+ "overflow": null,
202
+ "overflow_x": null,
203
+ "overflow_y": null,
204
+ "padding": null,
205
+ "right": null,
206
+ "top": null,
207
+ "visibility": null,
208
+ "width": null
209
+ }
210
+ },
211
+ "6ac91c5649bc4b6fa0d3860e71cdb01b": {
212
+ "model_module": "@jupyter-widgets/controls",
213
+ "model_name": "DescriptionStyleModel",
214
+ "model_module_version": "1.5.0",
215
+ "state": {
216
+ "_model_module": "@jupyter-widgets/controls",
217
+ "_model_module_version": "1.5.0",
218
+ "_model_name": "DescriptionStyleModel",
219
+ "_view_count": null,
220
+ "_view_module": "@jupyter-widgets/base",
221
+ "_view_module_version": "1.2.0",
222
+ "_view_name": "StyleView",
223
+ "description_width": ""
224
+ }
225
+ },
226
+ "90bfdbd03ac44d5ca3df23ebcc02d863": {
227
+ "model_module": "@jupyter-widgets/base",
228
+ "model_name": "LayoutModel",
229
+ "model_module_version": "1.2.0",
230
+ "state": {
231
+ "_model_module": "@jupyter-widgets/base",
232
+ "_model_module_version": "1.2.0",
233
+ "_model_name": "LayoutModel",
234
+ "_view_count": null,
235
+ "_view_module": "@jupyter-widgets/base",
236
+ "_view_module_version": "1.2.0",
237
+ "_view_name": "LayoutView",
238
+ "align_content": null,
239
+ "align_items": null,
240
+ "align_self": null,
241
+ "border": null,
242
+ "bottom": null,
243
+ "display": null,
244
+ "flex": null,
245
+ "flex_flow": null,
246
+ "grid_area": null,
247
+ "grid_auto_columns": null,
248
+ "grid_auto_flow": null,
249
+ "grid_auto_rows": null,
250
+ "grid_column": null,
251
+ "grid_gap": null,
252
+ "grid_row": null,
253
+ "grid_template_areas": null,
254
+ "grid_template_columns": null,
255
+ "grid_template_rows": null,
256
+ "height": null,
257
+ "justify_content": null,
258
+ "justify_items": null,
259
+ "left": null,
260
+ "margin": null,
261
+ "max_height": null,
262
+ "max_width": null,
263
+ "min_height": null,
264
+ "min_width": null,
265
+ "object_fit": null,
266
+ "object_position": null,
267
+ "order": null,
268
+ "overflow": null,
269
+ "overflow_x": null,
270
+ "overflow_y": null,
271
+ "padding": null,
272
+ "right": null,
273
+ "top": null,
274
+ "visibility": null,
275
+ "width": null
276
+ }
277
+ },
278
+ "6618b65302df46098bd2ad029ed668eb": {
279
+ "model_module": "@jupyter-widgets/controls",
280
+ "model_name": "ProgressStyleModel",
281
+ "model_module_version": "1.5.0",
282
+ "state": {
283
+ "_model_module": "@jupyter-widgets/controls",
284
+ "_model_module_version": "1.5.0",
285
+ "_model_name": "ProgressStyleModel",
286
+ "_view_count": null,
287
+ "_view_module": "@jupyter-widgets/base",
288
+ "_view_module_version": "1.2.0",
289
+ "_view_name": "StyleView",
290
+ "bar_color": null,
291
+ "description_width": ""
292
+ }
293
+ },
294
+ "a04d20d2b4934f35a7890f24546023d7": {
295
+ "model_module": "@jupyter-widgets/base",
296
+ "model_name": "LayoutModel",
297
+ "model_module_version": "1.2.0",
298
+ "state": {
299
+ "_model_module": "@jupyter-widgets/base",
300
+ "_model_module_version": "1.2.0",
301
+ "_model_name": "LayoutModel",
302
+ "_view_count": null,
303
+ "_view_module": "@jupyter-widgets/base",
304
+ "_view_module_version": "1.2.0",
305
+ "_view_name": "LayoutView",
306
+ "align_content": null,
307
+ "align_items": null,
308
+ "align_self": null,
309
+ "border": null,
310
+ "bottom": null,
311
+ "display": null,
312
+ "flex": null,
313
+ "flex_flow": null,
314
+ "grid_area": null,
315
+ "grid_auto_columns": null,
316
+ "grid_auto_flow": null,
317
+ "grid_auto_rows": null,
318
+ "grid_column": null,
319
+ "grid_gap": null,
320
+ "grid_row": null,
321
+ "grid_template_areas": null,
322
+ "grid_template_columns": null,
323
+ "grid_template_rows": null,
324
+ "height": null,
325
+ "justify_content": null,
326
+ "justify_items": null,
327
+ "left": null,
328
+ "margin": null,
329
+ "max_height": null,
330
+ "max_width": null,
331
+ "min_height": null,
332
+ "min_width": null,
333
+ "object_fit": null,
334
+ "object_position": null,
335
+ "order": null,
336
+ "overflow": null,
337
+ "overflow_x": null,
338
+ "overflow_y": null,
339
+ "padding": null,
340
+ "right": null,
341
+ "top": null,
342
+ "visibility": null,
343
+ "width": null
344
+ }
345
+ },
346
+ "5f5df2985c6f49d3816888ec4ebd9add": {
347
+ "model_module": "@jupyter-widgets/controls",
348
+ "model_name": "DescriptionStyleModel",
349
+ "model_module_version": "1.5.0",
350
+ "state": {
351
+ "_model_module": "@jupyter-widgets/controls",
352
+ "_model_module_version": "1.5.0",
353
+ "_model_name": "DescriptionStyleModel",
354
+ "_view_count": null,
355
+ "_view_module": "@jupyter-widgets/base",
356
+ "_view_module_version": "1.2.0",
357
+ "_view_name": "StyleView",
358
+ "description_width": ""
359
+ }
360
+ },
361
+ "419e345a9ff34dce965af89ad6569ff1": {
362
+ "model_module": "@jupyter-widgets/controls",
363
+ "model_name": "HBoxModel",
364
+ "model_module_version": "1.5.0",
365
+ "state": {
366
+ "_dom_classes": [],
367
+ "_model_module": "@jupyter-widgets/controls",
368
+ "_model_module_version": "1.5.0",
369
+ "_model_name": "HBoxModel",
370
+ "_view_count": null,
371
+ "_view_module": "@jupyter-widgets/controls",
372
+ "_view_module_version": "1.5.0",
373
+ "_view_name": "HBoxView",
374
+ "box_style": "",
375
+ "children": [
376
+ "IPY_MODEL_2f1473b2871b49f5b916298ab9866bd8",
377
+ "IPY_MODEL_171e066c98944ad2abcad3f61d11144c",
378
+ "IPY_MODEL_67c666a7cc8e4ae5a5c6c7442fdb8484"
379
+ ],
380
+ "layout": "IPY_MODEL_3eb3a35b6b2b414fa725a49e002da0e4"
381
+ }
382
+ },
383
+ "2f1473b2871b49f5b916298ab9866bd8": {
384
+ "model_module": "@jupyter-widgets/controls",
385
+ "model_name": "HTMLModel",
386
+ "model_module_version": "1.5.0",
387
+ "state": {
388
+ "_dom_classes": [],
389
+ "_model_module": "@jupyter-widgets/controls",
390
+ "_model_module_version": "1.5.0",
391
+ "_model_name": "HTMLModel",
392
+ "_view_count": null,
393
+ "_view_module": "@jupyter-widgets/controls",
394
+ "_view_module_version": "1.5.0",
395
+ "_view_name": "HTMLView",
396
+ "description": "",
397
+ "description_tooltip": null,
398
+ "layout": "IPY_MODEL_12e03166c7854af59078eb601478338a",
399
+ "placeholder": "​",
400
+ "style": "IPY_MODEL_9310394718a64134b5d17290db28f880",
401
+ "value": "Map: 100%"
402
+ }
403
+ },
404
+ "171e066c98944ad2abcad3f61d11144c": {
405
+ "model_module": "@jupyter-widgets/controls",
406
+ "model_name": "FloatProgressModel",
407
+ "model_module_version": "1.5.0",
408
+ "state": {
409
+ "_dom_classes": [],
410
+ "_model_module": "@jupyter-widgets/controls",
411
+ "_model_module_version": "1.5.0",
412
+ "_model_name": "FloatProgressModel",
413
+ "_view_count": null,
414
+ "_view_module": "@jupyter-widgets/controls",
415
+ "_view_module_version": "1.5.0",
416
+ "_view_name": "ProgressView",
417
+ "bar_style": "success",
418
+ "description": "",
419
+ "description_tooltip": null,
420
+ "layout": "IPY_MODEL_c851e5be2363465981fa92285c6a9579",
421
+ "max": 18647,
422
+ "min": 0,
423
+ "orientation": "horizontal",
424
+ "style": "IPY_MODEL_3d2bdb8969cd45499c33a49ab73cab39",
425
+ "value": 18647
426
+ }
427
+ },
428
+ "67c666a7cc8e4ae5a5c6c7442fdb8484": {
429
+ "model_module": "@jupyter-widgets/controls",
430
+ "model_name": "HTMLModel",
431
+ "model_module_version": "1.5.0",
432
+ "state": {
433
+ "_dom_classes": [],
434
+ "_model_module": "@jupyter-widgets/controls",
435
+ "_model_module_version": "1.5.0",
436
+ "_model_name": "HTMLModel",
437
+ "_view_count": null,
438
+ "_view_module": "@jupyter-widgets/controls",
439
+ "_view_module_version": "1.5.0",
440
+ "_view_name": "HTMLView",
441
+ "description": "",
442
+ "description_tooltip": null,
443
+ "layout": "IPY_MODEL_5e584348fea147fb95649323af691ea9",
444
+ "placeholder": "​",
445
+ "style": "IPY_MODEL_20af10029189476d8f84badf52743b26",
446
+ "value": " 18647/18647 [00:10<00:00, 1932.89 examples/s]"
447
+ }
448
+ },
449
+ "3eb3a35b6b2b414fa725a49e002da0e4": {
450
+ "model_module": "@jupyter-widgets/base",
451
+ "model_name": "LayoutModel",
452
+ "model_module_version": "1.2.0",
453
+ "state": {
454
+ "_model_module": "@jupyter-widgets/base",
455
+ "_model_module_version": "1.2.0",
456
+ "_model_name": "LayoutModel",
457
+ "_view_count": null,
458
+ "_view_module": "@jupyter-widgets/base",
459
+ "_view_module_version": "1.2.0",
460
+ "_view_name": "LayoutView",
461
+ "align_content": null,
462
+ "align_items": null,
463
+ "align_self": null,
464
+ "border": null,
465
+ "bottom": null,
466
+ "display": null,
467
+ "flex": null,
468
+ "flex_flow": null,
469
+ "grid_area": null,
470
+ "grid_auto_columns": null,
471
+ "grid_auto_flow": null,
472
+ "grid_auto_rows": null,
473
+ "grid_column": null,
474
+ "grid_gap": null,
475
+ "grid_row": null,
476
+ "grid_template_areas": null,
477
+ "grid_template_columns": null,
478
+ "grid_template_rows": null,
479
+ "height": null,
480
+ "justify_content": null,
481
+ "justify_items": null,
482
+ "left": null,
483
+ "margin": null,
484
+ "max_height": null,
485
+ "max_width": null,
486
+ "min_height": null,
487
+ "min_width": null,
488
+ "object_fit": null,
489
+ "object_position": null,
490
+ "order": null,
491
+ "overflow": null,
492
+ "overflow_x": null,
493
+ "overflow_y": null,
494
+ "padding": null,
495
+ "right": null,
496
+ "top": null,
497
+ "visibility": null,
498
+ "width": null
499
+ }
500
+ },
501
+ "12e03166c7854af59078eb601478338a": {
502
+ "model_module": "@jupyter-widgets/base",
503
+ "model_name": "LayoutModel",
504
+ "model_module_version": "1.2.0",
505
+ "state": {
506
+ "_model_module": "@jupyter-widgets/base",
507
+ "_model_module_version": "1.2.0",
508
+ "_model_name": "LayoutModel",
509
+ "_view_count": null,
510
+ "_view_module": "@jupyter-widgets/base",
511
+ "_view_module_version": "1.2.0",
512
+ "_view_name": "LayoutView",
513
+ "align_content": null,
514
+ "align_items": null,
515
+ "align_self": null,
516
+ "border": null,
517
+ "bottom": null,
518
+ "display": null,
519
+ "flex": null,
520
+ "flex_flow": null,
521
+ "grid_area": null,
522
+ "grid_auto_columns": null,
523
+ "grid_auto_flow": null,
524
+ "grid_auto_rows": null,
525
+ "grid_column": null,
526
+ "grid_gap": null,
527
+ "grid_row": null,
528
+ "grid_template_areas": null,
529
+ "grid_template_columns": null,
530
+ "grid_template_rows": null,
531
+ "height": null,
532
+ "justify_content": null,
533
+ "justify_items": null,
534
+ "left": null,
535
+ "margin": null,
536
+ "max_height": null,
537
+ "max_width": null,
538
+ "min_height": null,
539
+ "min_width": null,
540
+ "object_fit": null,
541
+ "object_position": null,
542
+ "order": null,
543
+ "overflow": null,
544
+ "overflow_x": null,
545
+ "overflow_y": null,
546
+ "padding": null,
547
+ "right": null,
548
+ "top": null,
549
+ "visibility": null,
550
+ "width": null
551
+ }
552
+ },
553
+ "9310394718a64134b5d17290db28f880": {
554
+ "model_module": "@jupyter-widgets/controls",
555
+ "model_name": "DescriptionStyleModel",
556
+ "model_module_version": "1.5.0",
557
+ "state": {
558
+ "_model_module": "@jupyter-widgets/controls",
559
+ "_model_module_version": "1.5.0",
560
+ "_model_name": "DescriptionStyleModel",
561
+ "_view_count": null,
562
+ "_view_module": "@jupyter-widgets/base",
563
+ "_view_module_version": "1.2.0",
564
+ "_view_name": "StyleView",
565
+ "description_width": ""
566
+ }
567
+ },
568
+ "c851e5be2363465981fa92285c6a9579": {
569
+ "model_module": "@jupyter-widgets/base",
570
+ "model_name": "LayoutModel",
571
+ "model_module_version": "1.2.0",
572
+ "state": {
573
+ "_model_module": "@jupyter-widgets/base",
574
+ "_model_module_version": "1.2.0",
575
+ "_model_name": "LayoutModel",
576
+ "_view_count": null,
577
+ "_view_module": "@jupyter-widgets/base",
578
+ "_view_module_version": "1.2.0",
579
+ "_view_name": "LayoutView",
580
+ "align_content": null,
581
+ "align_items": null,
582
+ "align_self": null,
583
+ "border": null,
584
+ "bottom": null,
585
+ "display": null,
586
+ "flex": null,
587
+ "flex_flow": null,
588
+ "grid_area": null,
589
+ "grid_auto_columns": null,
590
+ "grid_auto_flow": null,
591
+ "grid_auto_rows": null,
592
+ "grid_column": null,
593
+ "grid_gap": null,
594
+ "grid_row": null,
595
+ "grid_template_areas": null,
596
+ "grid_template_columns": null,
597
+ "grid_template_rows": null,
598
+ "height": null,
599
+ "justify_content": null,
600
+ "justify_items": null,
601
+ "left": null,
602
+ "margin": null,
603
+ "max_height": null,
604
+ "max_width": null,
605
+ "min_height": null,
606
+ "min_width": null,
607
+ "object_fit": null,
608
+ "object_position": null,
609
+ "order": null,
610
+ "overflow": null,
611
+ "overflow_x": null,
612
+ "overflow_y": null,
613
+ "padding": null,
614
+ "right": null,
615
+ "top": null,
616
+ "visibility": null,
617
+ "width": null
618
+ }
619
+ },
620
+ "3d2bdb8969cd45499c33a49ab73cab39": {
621
+ "model_module": "@jupyter-widgets/controls",
622
+ "model_name": "ProgressStyleModel",
623
+ "model_module_version": "1.5.0",
624
+ "state": {
625
+ "_model_module": "@jupyter-widgets/controls",
626
+ "_model_module_version": "1.5.0",
627
+ "_model_name": "ProgressStyleModel",
628
+ "_view_count": null,
629
+ "_view_module": "@jupyter-widgets/base",
630
+ "_view_module_version": "1.2.0",
631
+ "_view_name": "StyleView",
632
+ "bar_color": null,
633
+ "description_width": ""
634
+ }
635
+ },
636
+ "5e584348fea147fb95649323af691ea9": {
637
+ "model_module": "@jupyter-widgets/base",
638
+ "model_name": "LayoutModel",
639
+ "model_module_version": "1.2.0",
640
+ "state": {
641
+ "_model_module": "@jupyter-widgets/base",
642
+ "_model_module_version": "1.2.0",
643
+ "_model_name": "LayoutModel",
644
+ "_view_count": null,
645
+ "_view_module": "@jupyter-widgets/base",
646
+ "_view_module_version": "1.2.0",
647
+ "_view_name": "LayoutView",
648
+ "align_content": null,
649
+ "align_items": null,
650
+ "align_self": null,
651
+ "border": null,
652
+ "bottom": null,
653
+ "display": null,
654
+ "flex": null,
655
+ "flex_flow": null,
656
+ "grid_area": null,
657
+ "grid_auto_columns": null,
658
+ "grid_auto_flow": null,
659
+ "grid_auto_rows": null,
660
+ "grid_column": null,
661
+ "grid_gap": null,
662
+ "grid_row": null,
663
+ "grid_template_areas": null,
664
+ "grid_template_columns": null,
665
+ "grid_template_rows": null,
666
+ "height": null,
667
+ "justify_content": null,
668
+ "justify_items": null,
669
+ "left": null,
670
+ "margin": null,
671
+ "max_height": null,
672
+ "max_width": null,
673
+ "min_height": null,
674
+ "min_width": null,
675
+ "object_fit": null,
676
+ "object_position": null,
677
+ "order": null,
678
+ "overflow": null,
679
+ "overflow_x": null,
680
+ "overflow_y": null,
681
+ "padding": null,
682
+ "right": null,
683
+ "top": null,
684
+ "visibility": null,
685
+ "width": null
686
+ }
687
+ },
688
+ "20af10029189476d8f84badf52743b26": {
689
+ "model_module": "@jupyter-widgets/controls",
690
+ "model_name": "DescriptionStyleModel",
691
+ "model_module_version": "1.5.0",
692
+ "state": {
693
+ "_model_module": "@jupyter-widgets/controls",
694
+ "_model_module_version": "1.5.0",
695
+ "_model_name": "DescriptionStyleModel",
696
+ "_view_count": null,
697
+ "_view_module": "@jupyter-widgets/base",
698
+ "_view_module_version": "1.2.0",
699
+ "_view_name": "StyleView",
700
+ "description_width": ""
701
+ }
702
+ },
703
+ "cc5509c288034ce6939f574301fd5eb2": {
704
+ "model_module": "@jupyter-widgets/controls",
705
+ "model_name": "HBoxModel",
706
+ "model_module_version": "1.5.0",
707
+ "state": {
708
+ "_dom_classes": [],
709
+ "_model_module": "@jupyter-widgets/controls",
710
+ "_model_module_version": "1.5.0",
711
+ "_model_name": "HBoxModel",
712
+ "_view_count": null,
713
+ "_view_module": "@jupyter-widgets/controls",
714
+ "_view_module_version": "1.5.0",
715
+ "_view_name": "HBoxView",
716
+ "box_style": "",
717
+ "children": [
718
+ "IPY_MODEL_6eee983896f948f7919722045b190b01",
719
+ "IPY_MODEL_114a72cf85334713aefd3ee3616ebeb2",
720
+ "IPY_MODEL_4dfd352e599c4676b9871da14e8d75b5"
721
+ ],
722
+ "layout": "IPY_MODEL_efa7962c591e4b4bb6c54f008444bb0f"
723
+ }
724
+ },
725
+ "6eee983896f948f7919722045b190b01": {
726
+ "model_module": "@jupyter-widgets/controls",
727
+ "model_name": "HTMLModel",
728
+ "model_module_version": "1.5.0",
729
+ "state": {
730
+ "_dom_classes": [],
731
+ "_model_module": "@jupyter-widgets/controls",
732
+ "_model_module_version": "1.5.0",
733
+ "_model_name": "HTMLModel",
734
+ "_view_count": null,
735
+ "_view_module": "@jupyter-widgets/controls",
736
+ "_view_module_version": "1.5.0",
737
+ "_view_name": "HTMLView",
738
+ "description": "",
739
+ "description_tooltip": null,
740
+ "layout": "IPY_MODEL_0169ffddc6014c2ebf92ee902e198e1a",
741
+ "placeholder": "​",
742
+ "style": "IPY_MODEL_0e73212111294bdda9c79b97696aaa48",
743
+ "value": "Downloading builder script: "
744
+ }
745
+ },
746
+ "114a72cf85334713aefd3ee3616ebeb2": {
747
+ "model_module": "@jupyter-widgets/controls",
748
+ "model_name": "FloatProgressModel",
749
+ "model_module_version": "1.5.0",
750
+ "state": {
751
+ "_dom_classes": [],
752
+ "_model_module": "@jupyter-widgets/controls",
753
+ "_model_module_version": "1.5.0",
754
+ "_model_name": "FloatProgressModel",
755
+ "_view_count": null,
756
+ "_view_module": "@jupyter-widgets/controls",
757
+ "_view_module_version": "1.5.0",
758
+ "_view_name": "ProgressView",
759
+ "bar_style": "success",
760
+ "description": "",
761
+ "description_tooltip": null,
762
+ "layout": "IPY_MODEL_6cf0f639236040b9ab8bc84665a1e94a",
763
+ "max": 1,
764
+ "min": 0,
765
+ "orientation": "horizontal",
766
+ "style": "IPY_MODEL_ab1eb221a83e4e7299e5fe1056cbf4b8",
767
+ "value": 1
768
+ }
769
+ },
770
+ "4dfd352e599c4676b9871da14e8d75b5": {
771
+ "model_module": "@jupyter-widgets/controls",
772
+ "model_name": "HTMLModel",
773
+ "model_module_version": "1.5.0",
774
+ "state": {
775
+ "_dom_classes": [],
776
+ "_model_module": "@jupyter-widgets/controls",
777
+ "_model_module_version": "1.5.0",
778
+ "_model_name": "HTMLModel",
779
+ "_view_count": null,
780
+ "_view_module": "@jupyter-widgets/controls",
781
+ "_view_module_version": "1.5.0",
782
+ "_view_name": "HTMLView",
783
+ "description": "",
784
+ "description_tooltip": null,
785
+ "layout": "IPY_MODEL_d13fccbfd59c43f687edb2aa0bdd6145",
786
+ "placeholder": "​",
787
+ "style": "IPY_MODEL_06c209127feb497bb2e171d229183f16",
788
+ "value": " 8.15k/? [00:00<00:00, 561kB/s]"
789
+ }
790
+ },
791
+ "efa7962c591e4b4bb6c54f008444bb0f": {
792
+ "model_module": "@jupyter-widgets/base",
793
+ "model_name": "LayoutModel",
794
+ "model_module_version": "1.2.0",
795
+ "state": {
796
+ "_model_module": "@jupyter-widgets/base",
797
+ "_model_module_version": "1.2.0",
798
+ "_model_name": "LayoutModel",
799
+ "_view_count": null,
800
+ "_view_module": "@jupyter-widgets/base",
801
+ "_view_module_version": "1.2.0",
802
+ "_view_name": "LayoutView",
803
+ "align_content": null,
804
+ "align_items": null,
805
+ "align_self": null,
806
+ "border": null,
807
+ "bottom": null,
808
+ "display": null,
809
+ "flex": null,
810
+ "flex_flow": null,
811
+ "grid_area": null,
812
+ "grid_auto_columns": null,
813
+ "grid_auto_flow": null,
814
+ "grid_auto_rows": null,
815
+ "grid_column": null,
816
+ "grid_gap": null,
817
+ "grid_row": null,
818
+ "grid_template_areas": null,
819
+ "grid_template_columns": null,
820
+ "grid_template_rows": null,
821
+ "height": null,
822
+ "justify_content": null,
823
+ "justify_items": null,
824
+ "left": null,
825
+ "margin": null,
826
+ "max_height": null,
827
+ "max_width": null,
828
+ "min_height": null,
829
+ "min_width": null,
830
+ "object_fit": null,
831
+ "object_position": null,
832
+ "order": null,
833
+ "overflow": null,
834
+ "overflow_x": null,
835
+ "overflow_y": null,
836
+ "padding": null,
837
+ "right": null,
838
+ "top": null,
839
+ "visibility": null,
840
+ "width": null
841
+ }
842
+ },
843
+ "0169ffddc6014c2ebf92ee902e198e1a": {
844
+ "model_module": "@jupyter-widgets/base",
845
+ "model_name": "LayoutModel",
846
+ "model_module_version": "1.2.0",
847
+ "state": {
848
+ "_model_module": "@jupyter-widgets/base",
849
+ "_model_module_version": "1.2.0",
850
+ "_model_name": "LayoutModel",
851
+ "_view_count": null,
852
+ "_view_module": "@jupyter-widgets/base",
853
+ "_view_module_version": "1.2.0",
854
+ "_view_name": "LayoutView",
855
+ "align_content": null,
856
+ "align_items": null,
857
+ "align_self": null,
858
+ "border": null,
859
+ "bottom": null,
860
+ "display": null,
861
+ "flex": null,
862
+ "flex_flow": null,
863
+ "grid_area": null,
864
+ "grid_auto_columns": null,
865
+ "grid_auto_flow": null,
866
+ "grid_auto_rows": null,
867
+ "grid_column": null,
868
+ "grid_gap": null,
869
+ "grid_row": null,
870
+ "grid_template_areas": null,
871
+ "grid_template_columns": null,
872
+ "grid_template_rows": null,
873
+ "height": null,
874
+ "justify_content": null,
875
+ "justify_items": null,
876
+ "left": null,
877
+ "margin": null,
878
+ "max_height": null,
879
+ "max_width": null,
880
+ "min_height": null,
881
+ "min_width": null,
882
+ "object_fit": null,
883
+ "object_position": null,
884
+ "order": null,
885
+ "overflow": null,
886
+ "overflow_x": null,
887
+ "overflow_y": null,
888
+ "padding": null,
889
+ "right": null,
890
+ "top": null,
891
+ "visibility": null,
892
+ "width": null
893
+ }
894
+ },
895
+ "0e73212111294bdda9c79b97696aaa48": {
896
+ "model_module": "@jupyter-widgets/controls",
897
+ "model_name": "DescriptionStyleModel",
898
+ "model_module_version": "1.5.0",
899
+ "state": {
900
+ "_model_module": "@jupyter-widgets/controls",
901
+ "_model_module_version": "1.5.0",
902
+ "_model_name": "DescriptionStyleModel",
903
+ "_view_count": null,
904
+ "_view_module": "@jupyter-widgets/base",
905
+ "_view_module_version": "1.2.0",
906
+ "_view_name": "StyleView",
907
+ "description_width": ""
908
+ }
909
+ },
910
+ "6cf0f639236040b9ab8bc84665a1e94a": {
911
+ "model_module": "@jupyter-widgets/base",
912
+ "model_name": "LayoutModel",
913
+ "model_module_version": "1.2.0",
914
+ "state": {
915
+ "_model_module": "@jupyter-widgets/base",
916
+ "_model_module_version": "1.2.0",
917
+ "_model_name": "LayoutModel",
918
+ "_view_count": null,
919
+ "_view_module": "@jupyter-widgets/base",
920
+ "_view_module_version": "1.2.0",
921
+ "_view_name": "LayoutView",
922
+ "align_content": null,
923
+ "align_items": null,
924
+ "align_self": null,
925
+ "border": null,
926
+ "bottom": null,
927
+ "display": null,
928
+ "flex": null,
929
+ "flex_flow": null,
930
+ "grid_area": null,
931
+ "grid_auto_columns": null,
932
+ "grid_auto_flow": null,
933
+ "grid_auto_rows": null,
934
+ "grid_column": null,
935
+ "grid_gap": null,
936
+ "grid_row": null,
937
+ "grid_template_areas": null,
938
+ "grid_template_columns": null,
939
+ "grid_template_rows": null,
940
+ "height": null,
941
+ "justify_content": null,
942
+ "justify_items": null,
943
+ "left": null,
944
+ "margin": null,
945
+ "max_height": null,
946
+ "max_width": null,
947
+ "min_height": null,
948
+ "min_width": null,
949
+ "object_fit": null,
950
+ "object_position": null,
951
+ "order": null,
952
+ "overflow": null,
953
+ "overflow_x": null,
954
+ "overflow_y": null,
955
+ "padding": null,
956
+ "right": null,
957
+ "top": null,
958
+ "visibility": null,
959
+ "width": "20px"
960
+ }
961
+ },
962
+ "ab1eb221a83e4e7299e5fe1056cbf4b8": {
963
+ "model_module": "@jupyter-widgets/controls",
964
+ "model_name": "ProgressStyleModel",
965
+ "model_module_version": "1.5.0",
966
+ "state": {
967
+ "_model_module": "@jupyter-widgets/controls",
968
+ "_model_module_version": "1.5.0",
969
+ "_model_name": "ProgressStyleModel",
970
+ "_view_count": null,
971
+ "_view_module": "@jupyter-widgets/base",
972
+ "_view_module_version": "1.2.0",
973
+ "_view_name": "StyleView",
974
+ "bar_color": null,
975
+ "description_width": ""
976
+ }
977
+ },
978
+ "d13fccbfd59c43f687edb2aa0bdd6145": {
979
+ "model_module": "@jupyter-widgets/base",
980
+ "model_name": "LayoutModel",
981
+ "model_module_version": "1.2.0",
982
+ "state": {
983
+ "_model_module": "@jupyter-widgets/base",
984
+ "_model_module_version": "1.2.0",
985
+ "_model_name": "LayoutModel",
986
+ "_view_count": null,
987
+ "_view_module": "@jupyter-widgets/base",
988
+ "_view_module_version": "1.2.0",
989
+ "_view_name": "LayoutView",
990
+ "align_content": null,
991
+ "align_items": null,
992
+ "align_self": null,
993
+ "border": null,
994
+ "bottom": null,
995
+ "display": null,
996
+ "flex": null,
997
+ "flex_flow": null,
998
+ "grid_area": null,
999
+ "grid_auto_columns": null,
1000
+ "grid_auto_flow": null,
1001
+ "grid_auto_rows": null,
1002
+ "grid_column": null,
1003
+ "grid_gap": null,
1004
+ "grid_row": null,
1005
+ "grid_template_areas": null,
1006
+ "grid_template_columns": null,
1007
+ "grid_template_rows": null,
1008
+ "height": null,
1009
+ "justify_content": null,
1010
+ "justify_items": null,
1011
+ "left": null,
1012
+ "margin": null,
1013
+ "max_height": null,
1014
+ "max_width": null,
1015
+ "min_height": null,
1016
+ "min_width": null,
1017
+ "object_fit": null,
1018
+ "object_position": null,
1019
+ "order": null,
1020
+ "overflow": null,
1021
+ "overflow_x": null,
1022
+ "overflow_y": null,
1023
+ "padding": null,
1024
+ "right": null,
1025
+ "top": null,
1026
+ "visibility": null,
1027
+ "width": null
1028
+ }
1029
+ },
1030
+ "06c209127feb497bb2e171d229183f16": {
1031
+ "model_module": "@jupyter-widgets/controls",
1032
+ "model_name": "DescriptionStyleModel",
1033
+ "model_module_version": "1.5.0",
1034
+ "state": {
1035
+ "_model_module": "@jupyter-widgets/controls",
1036
+ "_model_module_version": "1.5.0",
1037
+ "_model_name": "DescriptionStyleModel",
1038
+ "_view_count": null,
1039
+ "_view_module": "@jupyter-widgets/base",
1040
+ "_view_module_version": "1.2.0",
1041
+ "_view_name": "StyleView",
1042
+ "description_width": ""
1043
+ }
1044
+ }
1045
+ }
1046
+ }
1047
+ },
1048
+ "cells": [
1049
+ {
1050
+ "cell_type": "markdown",
1051
+ "metadata": {
1052
+ "id": "view-in-github"
1053
+ },
1054
+ "source": [
1055
+ "<a href=\"https://colab.research.google.com/github/your-username/mt5-finetune-en-de/blob/main/mt5_finetune_en_de.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
1056
+ ]
1057
+ },
1058
+ {
1059
+ "cell_type": "markdown",
1060
+ "source": [
1061
+ "# Fine-Tuning mT5 for English → German Translation\n",
1062
+ "\n",
1063
+ "This notebook fine-tunes **`google/mt5-small`** on the **WMT16 En→De** dataset using Hugging Face Transformers.\n",
1064
+ "\n",
1065
+ "- Model: Multilingual T5 (mT5) – pre-trained on 101 languages **without supervised translation**\n",
1066
+ "- Task: Teach it **high-quality English to German translation** via fine-tuning\n",
1067
+ "- Dataset: WMT16 (via `datasets` library)\n",
1068
+ "- Framework: `transformers` + `seq2seq` Trainer\n",
1069
+ "\n",
1070
+ "---"
1071
+ ],
1072
+ "metadata": {
1073
+ "id": "XnxaVxCe8gjQ"
1074
+ }
1075
+ },
1076
+ {
1077
+ "cell_type": "markdown",
1078
+ "source": [
1079
+ "## 1. Install Dependencies"
1080
+ ],
1081
+ "metadata": {
1082
+ "id": "P05w2JE68gjR"
1083
+ }
1084
+ },
1085
+ {
1086
+ "cell_type": "code",
1087
+ "source": [
1088
+ "from google.colab import drive\n",
1089
+ "drive.mount('/content/drive')\n"
1090
+ ],
1091
+ "metadata": {
1092
+ "colab": {
1093
+ "base_uri": "https://localhost:8080/"
1094
+ },
1095
+ "id": "3-EOS45xALKJ",
1096
+ "outputId": "7b790ea5-c852-4688-9189-91a725c6be72"
1097
+ },
1098
+ "execution_count": 1,
1099
+ "outputs": [
1100
+ {
1101
+ "output_type": "stream",
1102
+ "name": "stdout",
1103
+ "text": [
1104
+ "Mounted at /content/drive\n"
1105
+ ]
1106
+ }
1107
+ ]
1108
+ },
1109
+ {
1110
+ "cell_type": "code",
1111
+ "execution_count": 2,
1112
+ "outputs": [
1113
+ {
1114
+ "output_type": "stream",
1115
+ "name": "stdout",
1116
+ "text": [
1117
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m51.8/51.8 kB\u001b[0m \u001b[31m2.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
1118
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m104.1/104.1 kB\u001b[0m \u001b[31m4.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
1119
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m84.1/84.1 kB\u001b[0m \u001b[31m2.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
1120
+ "\u001b[?25h"
1121
+ ]
1122
+ }
1123
+ ],
1124
+ "source": [
1125
+ "!pip install -q transformers datasets sentencepiece sacrebleu accelerate evaluate\n",
1126
+ "!pip install -q torch --index-url https://download.pytorch.org/whl/cu118"
1127
+ ],
1128
+ "metadata": {
1129
+ "colab": {
1130
+ "base_uri": "https://localhost:8080/"
1131
+ },
1132
+ "id": "tBzChq5L8gjR",
1133
+ "outputId": "264ac5a0-a9ef-4f49-a5be-6045a15181f6"
1134
+ }
1135
+ },
1136
+ {
1137
+ "cell_type": "code",
1138
+ "source": [
1139
+ "!pip install pandas"
1140
+ ],
1141
+ "metadata": {
1142
+ "colab": {
1143
+ "base_uri": "https://localhost:8080/"
1144
+ },
1145
+ "id": "1O0iGKJ7A6yW",
1146
+ "outputId": "4194b26b-9090-4444-866f-b86e7058179b"
1147
+ },
1148
+ "execution_count": 5,
1149
+ "outputs": [
1150
+ {
1151
+ "output_type": "stream",
1152
+ "name": "stdout",
1153
+ "text": [
1154
+ "Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (2.2.2)\n",
1155
+ "Requirement already satisfied: numpy>=1.26.0 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.0.2)\n",
1156
+ "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.9.0.post0)\n",
1157
+ "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.2)\n",
1158
+ "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.2)\n",
1159
+ "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)\n"
1160
+ ]
1161
+ }
1162
+ ]
1163
+ },
1164
+ {
1165
+ "cell_type": "markdown",
1166
+ "source": [
1167
+ "## 2. Load Dataset (WMT16 En→De)"
1168
+ ],
1169
+ "metadata": {
1170
+ "id": "_niqDW2e8gjS"
1171
+ }
1172
+ },
1173
+ {
1174
+ "cell_type": "code",
1175
+ "source": [
1176
+ "import gdown\n",
1177
+ "import json\n",
1178
+ "\n",
1179
+ "data_path = \"/content/drive/MyDrive/llm-translator/parallel_corpus.json\" # Adjust path\n",
1180
+ "try:\n",
1181
+ " with open(data_path, \"r\", encoding=\"utf-8\") as file:\n",
1182
+ " data = json.load(file)\n",
1183
+ " print(f\"Loaded {len(data)} entries from parallel_corpus.json\")\n",
1184
+ "except FileNotFoundError:\n",
1185
+ " print(f\"Error: The file '{data_path}' was not found.\")\n",
1186
+ " exit(1)\n",
1187
+ "except json.JSONDecodeError:\n",
1188
+ " print(\"Error: Failed to decode JSON from the file.\")\n",
1189
+ " exit(1)"
1190
+ ],
1191
+ "metadata": {
1192
+ "colab": {
1193
+ "base_uri": "https://localhost:8080/"
1194
+ },
1195
+ "id": "ds1obOiM8gjT",
1196
+ "outputId": "25f14e99-d3fb-491b-c9c8-7807e8d2b6f5"
1197
+ },
1198
+ "execution_count": 3,
1199
+ "outputs": [
1200
+ {
1201
+ "output_type": "stream",
1202
+ "name": "stdout",
1203
+ "text": [
1204
+ "Loaded 31192 entries from parallel_corpus.json\n"
1205
+ ]
1206
+ }
1207
+ ]
1208
+ },
1209
+ {
1210
+ "cell_type": "code",
1211
+ "source": [
1212
+ "import string\n",
1213
+ "import pandas as pd\n",
1214
+ "from datasets import Dataset, DatasetDict\n",
1215
+ "\n",
1216
+ "\n",
1217
+ "# Build the three language pairs exactly as you did\n",
1218
+ "def is_valid(t): return bool(t and t.strip() and t.strip() not in string.punctuation)\n",
1219
+ "df = pd.DataFrame(data)\n",
1220
+ "breton_df = df[df.apply(lambda r: is_valid(r[\"niv_text\"]) and is_valid(r[\"koad21_text\"]), axis=1)][[\"niv_text\",\"koad21_text\"]].rename(columns={\"niv_text\":\"en\",\"koad21_text\":\"target\"})\n",
1221
+ "breton_df[\"language\"] = \"br\"\n",
1222
+ "cornish_df = df[df.apply(lambda r: is_valid(r[\"niv_text\"]) and is_valid(r[\"abk_text\"]), axis=1)][[\"niv_text\",\"abk_text\"]].rename(columns={\"niv_text\":\"en\",\"abk_text\":\"target\"})\n",
1223
+ "cornish_df[\"language\"] = \"abk\"\n",
1224
+ "welsh_df = df[df.apply(lambda r: is_valid(r[\"niv_text\"]) and is_valid(r[\"bcnda_text\"]), axis=1)][[\"niv_text\",\"bcnda_text\"]].rename(columns={\"niv_text\":\"en\",\"bcnda_text\":\"target\"})\n",
1225
+ "welsh_df[\"language\"] = \"cy\"\n",
1226
+ "\n",
1227
+ "\n",
1228
+ "combined_df = pd.concat([breton_df, cornish_df, welsh_df], ignore_index=True)\n",
1229
+ "print(combined_df.head(5))\n",
1230
+ "dataset = Dataset.from_pandas(combined_df).train_test_split(test_size=0.2, seed=42)\n",
1231
+ "print(f\"Combined dataset size: {len(combined_df)} pairs (Breton: {len(breton_df)}, Cornish: {len(cornish_df)}, Welsh: {len(welsh_df)})\")\n",
1232
+ "\n",
1233
+ "raw_datasets = DatasetDict({\n",
1234
+ " \"train\": dataset[\"train\"],\n",
1235
+ " \"test\" : dataset[\"test\"]\n",
1236
+ "})\n",
1237
+ "print(f\"Train: {len(raw_datasets['train'])}, Test: {len(raw_datasets['test'])}\")\n",
1238
+ "print(raw_datasets['train'])"
1239
+ ],
1240
+ "metadata": {
1241
+ "colab": {
1242
+ "base_uri": "https://localhost:8080/"
1243
+ },
1244
+ "id": "jbv3mVHMAdZT",
1245
+ "outputId": "f42a3f59-4be9-46f3-fc9d-80f4614cbf3f"
1246
+ },
1247
+ "execution_count": 10,
1248
+ "outputs": [
1249
+ {
1250
+ "output_type": "stream",
1251
+ "name": "stdout",
1252
+ "text": [
1253
+ " en \\\n",
1254
+ "0 The Lord called to Moses and spoke to him from... \n",
1255
+ "1 “Speak to the Israelites and say to them: ‘Whe... \n",
1256
+ "2 “ ‘If the offering is a burnt offering from th... \n",
1257
+ "3 You are to lay your hand on the head of the bu... \n",
1258
+ "4 You are to slaughter the young bull before the... \n",
1259
+ "\n",
1260
+ " target language \n",
1261
+ "0 An AOTROU a c’halvas Moizez hag a gomzas dezha... br \n",
1262
+ "1 Komz da vibien Israel ha lavar: Pa raio unan b... br \n",
1263
+ "2 Mar d-eo e brof ul loskaberzh a loened bras, e... br \n",
1264
+ "3 Lakaat a raio e zorn war benn al loskaberzh, a... br \n",
1265
+ "4 Lazhañ a raio ar c’hole dirak an AOTROU ; an a... br \n",
1266
+ "Combined dataset size: 93233 pairs (Breton: 31077, Cornish: 31086, Welsh: 31070)\n",
1267
+ "Train: 74586, Test: 18647\n",
1268
+ "Dataset({\n",
1269
+ " features: ['en', 'target', 'language'],\n",
1270
+ " num_rows: 74586\n",
1271
+ "})\n"
1272
+ ]
1273
+ }
1274
+ ]
1275
+ },
1276
+ {
1277
+ "cell_type": "markdown",
1278
+ "source": [
1279
+ "## 3. Load Model & Tokenizer"
1280
+ ],
1281
+ "metadata": {
1282
+ "id": "920INjs88gjT"
1283
+ }
1284
+ },
1285
+ {
1286
+ "cell_type": "code",
1287
+ "source": [
1288
+ "from transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n",
1289
+ "\n",
1290
+ "model_name = \"t5-small\"\n",
1291
+ "tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)\n",
1292
+ "model = AutoModelForSeq2SeqLM.from_pretrained(model_name)\n",
1293
+ "\n",
1294
+ "# mT5 uses SentencePiece – no fast tokenizer available"
1295
+ ],
1296
+ "metadata": {
1297
+ "id": "7XvJ2Sl38gjT"
1298
+ },
1299
+ "execution_count": 11,
1300
+ "outputs": []
1301
+ },
1302
+ {
1303
+ "cell_type": "markdown",
1304
+ "source": [
1305
+ "## 4. Preprocess Function"
1306
+ ],
1307
+ "metadata": {
1308
+ "id": "H-cPZfJj8gjU"
1309
+ }
1310
+ },
1311
+ {
1312
+ "cell_type": "code",
1313
+ "source": [
1314
+ "max_input_length = 128\n",
1315
+ "max_target_length = 128\n",
1316
+ "\n",
1317
+ "def preprocess(examples):\n",
1318
+ " inputs = [f\"translate English to {lang}: {en}\"\n",
1319
+ " for lang, en in zip(examples[\"language\"], examples[\"en\"])]\n",
1320
+ " targets = examples[\"target\"]\n",
1321
+ " model_inputs = tokenizer(inputs, max_length=max_input_length,\n",
1322
+ " truncation=True, padding=\"max_length\")\n",
1323
+ " labels = tokenizer(targets, max_length=max_target_length,\n",
1324
+ " truncation=True, padding=\"max_length\").input_ids\n",
1325
+ " model_inputs[\"labels\"] = labels\n",
1326
+ " return model_inputs\n",
1327
+ "\n"
1328
+ ],
1329
+ "metadata": {
1330
+ "id": "eY7-UjaE8gjU"
1331
+ },
1332
+ "execution_count": 13,
1333
+ "outputs": []
1334
+ },
1335
+ {
1336
+ "cell_type": "code",
1337
+ "source": [
1338
+ "# Apply preprocessing\n",
1339
+ "print(\"Tokenising …\")\n",
1340
+ "print(raw_datasets)\n",
1341
+ "tokenized_datasets = raw_datasets.map(preprocess,\n",
1342
+ " batched=True,\n",
1343
+ " remove_columns=raw_datasets[\"train\"].column_names)\n",
1344
+ "\n",
1345
+ "print(tokenized_datasets)"
1346
+ ],
1347
+ "metadata": {
1348
+ "colab": {
1349
+ "base_uri": "https://localhost:8080/",
1350
+ "height": 456,
1351
+ "referenced_widgets": [
1352
+ "8a49fb7918744d23bf64237c2674fc6b",
1353
+ "7a64e72d88084b6897d53c344aad4358",
1354
+ "85b34cfea65c4a8dbdae30a8eec22a98",
1355
+ "500b7fb2607c4eebbb5bea0bfc4e33ff",
1356
+ "b53f4a68898e45098d6801e4987e4c10",
1357
+ "008567f59c13464b82ca3946ccf0058c",
1358
+ "6ac91c5649bc4b6fa0d3860e71cdb01b",
1359
+ "90bfdbd03ac44d5ca3df23ebcc02d863",
1360
+ "6618b65302df46098bd2ad029ed668eb",
1361
+ "a04d20d2b4934f35a7890f24546023d7",
1362
+ "5f5df2985c6f49d3816888ec4ebd9add",
1363
+ "419e345a9ff34dce965af89ad6569ff1",
1364
+ "2f1473b2871b49f5b916298ab9866bd8",
1365
+ "171e066c98944ad2abcad3f61d11144c",
1366
+ "67c666a7cc8e4ae5a5c6c7442fdb8484",
1367
+ "3eb3a35b6b2b414fa725a49e002da0e4",
1368
+ "12e03166c7854af59078eb601478338a",
1369
+ "9310394718a64134b5d17290db28f880",
1370
+ "c851e5be2363465981fa92285c6a9579",
1371
+ "3d2bdb8969cd45499c33a49ab73cab39",
1372
+ "5e584348fea147fb95649323af691ea9",
1373
+ "20af10029189476d8f84badf52743b26"
1374
+ ]
1375
+ },
1376
+ "id": "HVO9qHXZ8gjV",
1377
+ "outputId": "0a5edd95-149e-4596-abc3-2668b500f7b1"
1378
+ },
1379
+ "execution_count": 14,
1380
+ "outputs": [
1381
+ {
1382
+ "output_type": "stream",
1383
+ "name": "stdout",
1384
+ "text": [
1385
+ "Tokenising …\n",
1386
+ "DatasetDict({\n",
1387
+ " train: Dataset({\n",
1388
+ " features: ['en', 'target', 'language'],\n",
1389
+ " num_rows: 74586\n",
1390
+ " })\n",
1391
+ " test: Dataset({\n",
1392
+ " features: ['en', 'target', 'language'],\n",
1393
+ " num_rows: 18647\n",
1394
+ " })\n",
1395
+ "})\n"
1396
+ ]
1397
+ },
1398
+ {
1399
+ "output_type": "display_data",
1400
+ "data": {
1401
+ "text/plain": [
1402
+ "Map: 0%| | 0/74586 [00:00<?, ? examples/s]"
1403
+ ],
1404
+ "application/vnd.jupyter.widget-view+json": {
1405
+ "version_major": 2,
1406
+ "version_minor": 0,
1407
+ "model_id": "8a49fb7918744d23bf64237c2674fc6b"
1408
+ }
1409
+ },
1410
+ "metadata": {}
1411
+ },
1412
+ {
1413
+ "output_type": "display_data",
1414
+ "data": {
1415
+ "text/plain": [
1416
+ "Map: 0%| | 0/18647 [00:00<?, ? examples/s]"
1417
+ ],
1418
+ "application/vnd.jupyter.widget-view+json": {
1419
+ "version_major": 2,
1420
+ "version_minor": 0,
1421
+ "model_id": "419e345a9ff34dce965af89ad6569ff1"
1422
+ }
1423
+ },
1424
+ "metadata": {}
1425
+ },
1426
+ {
1427
+ "output_type": "stream",
1428
+ "name": "stdout",
1429
+ "text": [
1430
+ "DatasetDict({\n",
1431
+ " train: Dataset({\n",
1432
+ " features: ['input_ids', 'attention_mask', 'labels'],\n",
1433
+ " num_rows: 74586\n",
1434
+ " })\n",
1435
+ " test: Dataset({\n",
1436
+ " features: ['input_ids', 'attention_mask', 'labels'],\n",
1437
+ " num_rows: 18647\n",
1438
+ " })\n",
1439
+ "})\n"
1440
+ ]
1441
+ }
1442
+ ]
1443
+ },
1444
+ {
1445
+ "cell_type": "markdown",
1446
+ "source": [
1447
+ "## 5. Data Collator"
1448
+ ],
1449
+ "metadata": {
1450
+ "id": "TynYzUyG8gjV"
1451
+ }
1452
+ },
1453
+ {
1454
+ "cell_type": "code",
1455
+ "source": [
1456
+ "from transformers import DataCollatorForSeq2Seq\n",
1457
+ "\n",
1458
+ "data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)"
1459
+ ],
1460
+ "metadata": {
1461
+ "id": "CAumdmQH8gjV"
1462
+ },
1463
+ "execution_count": 15,
1464
+ "outputs": []
1465
+ },
1466
+ {
1467
+ "cell_type": "markdown",
1468
+ "source": [
1469
+ "## 6. Evaluation Metric (BLEU)"
1470
+ ],
1471
+ "metadata": {
1472
+ "id": "4CO7Ribv8gjV"
1473
+ }
1474
+ },
1475
+ {
1476
+ "cell_type": "code",
1477
+ "source": [
1478
+ "import evaluate\n",
1479
+ "import numpy as np\n",
1480
+ "\n",
1481
+ "metric = evaluate.load(\"sacrebleu\")\n",
1482
+ "\n",
1483
+ "def postprocess_text(preds, labels):\n",
1484
+ " preds = [pred.strip() for pred in preds]\n",
1485
+ " labels = [[label.strip()] for label in labels]\n",
1486
+ " return preds, labels\n",
1487
+ "\n",
1488
+ "def compute_metrics(eval_preds):\n",
1489
+ " preds, labels = eval_preds\n",
1490
+ " if isinstance(preds, tuple):\n",
1491
+ " preds = preds[0]\n",
1492
+ " decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)\n",
1493
+ "\n",
1494
+ " # Replace -100 in the labels as we can't decode them\n",
1495
+ " labels = np.where(labels != -100, labels, tokenizer.pad_token_id)\n",
1496
+ " decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)\n",
1497
+ "\n",
1498
+ " decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)\n",
1499
+ "\n",
1500
+ " result = metric.compute(predictions=decoded_preds, references=decoded_labels)\n",
1501
+ " result = {\"bleu\": result[\"score\"]}\n",
1502
+ "\n",
1503
+ " return result"
1504
+ ],
1505
+ "metadata": {
1506
+ "colab": {
1507
+ "base_uri": "https://localhost:8080/",
1508
+ "height": 49,
1509
+ "referenced_widgets": [
1510
+ "cc5509c288034ce6939f574301fd5eb2",
1511
+ "6eee983896f948f7919722045b190b01",
1512
+ "114a72cf85334713aefd3ee3616ebeb2",
1513
+ "4dfd352e599c4676b9871da14e8d75b5",
1514
+ "efa7962c591e4b4bb6c54f008444bb0f",
1515
+ "0169ffddc6014c2ebf92ee902e198e1a",
1516
+ "0e73212111294bdda9c79b97696aaa48",
1517
+ "6cf0f639236040b9ab8bc84665a1e94a",
1518
+ "ab1eb221a83e4e7299e5fe1056cbf4b8",
1519
+ "d13fccbfd59c43f687edb2aa0bdd6145",
1520
+ "06c209127feb497bb2e171d229183f16"
1521
+ ]
1522
+ },
1523
+ "id": "KoUWoPjQ8gjV",
1524
+ "outputId": "efd48518-f5a0-463a-ad3e-86ea3b856c4f"
1525
+ },
1526
+ "execution_count": 16,
1527
+ "outputs": [
1528
+ {
1529
+ "output_type": "display_data",
1530
+ "data": {
1531
+ "text/plain": [
1532
+ "Downloading builder script: 0.00B [00:00, ?B/s]"
1533
+ ],
1534
+ "application/vnd.jupyter.widget-view+json": {
1535
+ "version_major": 2,
1536
+ "version_minor": 0,
1537
+ "model_id": "cc5509c288034ce6939f574301fd5eb2"
1538
+ }
1539
+ },
1540
+ "metadata": {}
1541
+ }
1542
+ ]
1543
+ },
1544
+ {
1545
+ "cell_type": "markdown",
1546
+ "source": [
1547
+ "## 7. Training Setup"
1548
+ ],
1549
+ "metadata": {
1550
+ "id": "2B_-jhu58gjV"
1551
+ }
1552
+ },
1553
+ {
1554
+ "cell_type": "code",
1555
+ "source": [
1556
+ "from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer\n",
1557
+ "final_dir = \"/content/drive/MyDrive/llm-low-resource-translator/t5_multilingual/final\"\n",
1558
+ "training_args = Seq2SeqTrainingArguments(\n",
1559
+ " output_dir=final_dir,\n",
1560
+ " eval_strategy=\"epoch\",\n",
1561
+ " learning_rate=3e-4,\n",
1562
+ " per_device_train_batch_size=16,\n",
1563
+ " per_device_eval_batch_size=16,\n",
1564
+ " weight_decay=0.01,\n",
1565
+ " save_total_limit=3,\n",
1566
+ " num_train_epochs=3,\n",
1567
+ " predict_with_generate=True,\n",
1568
+ " fp16=True,\n",
1569
+ " push_to_hub=False,\n",
1570
+ " logging_steps=100,\n",
1571
+ " report_to=\"none\"\n",
1572
+ ")\n",
1573
+ "\n",
1574
+ "trainer = Seq2SeqTrainer(\n",
1575
+ " model=model,\n",
1576
+ " args=training_args,\n",
1577
+ " train_dataset=tokenized_datasets[\"train\"],\n",
1578
+ " eval_dataset=tokenized_datasets[\"test\"],\n",
1579
+ " tokenizer=tokenizer,\n",
1580
+ " data_collator=data_collator,\n",
1581
+ " compute_metrics=compute_metrics\n",
1582
+ ")"
1583
+ ],
1584
+ "metadata": {
1585
+ "colab": {
1586
+ "base_uri": "https://localhost:8080/"
1587
+ },
1588
+ "id": "wSKrfrez8gjV",
1589
+ "outputId": "0a0f5b30-3386-4fed-e350-9c59e0c57f17"
1590
+ },
1591
+ "execution_count": 17,
1592
+ "outputs": [
1593
+ {
1594
+ "output_type": "stream",
1595
+ "name": "stderr",
1596
+ "text": [
1597
+ "/tmp/ipython-input-2730798205.py:19: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Seq2SeqTrainer.__init__`. Use `processing_class` instead.\n",
1598
+ " trainer = Seq2SeqTrainer(\n"
1599
+ ]
1600
+ }
1601
+ ]
1602
+ },
1603
+ {
1604
+ "cell_type": "markdown",
1605
+ "source": [
1606
+ "## 8. Train the Model"
1607
+ ],
1608
+ "metadata": {
1609
+ "id": "uyPpu8Aq8gjV"
1610
+ }
1611
+ },
1612
+ {
1613
+ "cell_type": "code",
1614
+ "source": [
1615
+ "trainer.train()"
1616
+ ],
1617
+ "metadata": {
1618
+ "colab": {
1619
+ "base_uri": "https://localhost:8080/",
1620
+ "height": 427
1621
+ },
1622
+ "id": "xKYHdZWq8gjV",
1623
+ "outputId": "b2bd7c13-d758-49e8-c54a-c7d70cd28e35"
1624
+ },
1625
+ "execution_count": 18,
1626
+ "outputs": [
1627
+ {
1628
+ "output_type": "stream",
1629
+ "name": "stderr",
1630
+ "text": [
1631
+ "/usr/local/lib/python3.12/dist-packages/torch/utils/data/dataloader.py:666: UserWarning: 'pin_memory' argument is set as true but no accelerator is found, then device pinned memory won't be used.\n",
1632
+ " warnings.warn(warn_msg)\n"
1633
+ ]
1634
+ },
1635
+ {
1636
+ "output_type": "display_data",
1637
+ "data": {
1638
+ "text/plain": [
1639
+ "<IPython.core.display.HTML object>"
1640
+ ],
1641
+ "text/html": [
1642
+ "\n",
1643
+ " <div>\n",
1644
+ " \n",
1645
+ " <progress value='136' max='13986' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
1646
+ " [ 136/13986 42:16 < 72:48:56, 0.05 it/s, Epoch 0.03/3]\n",
1647
+ " </div>\n",
1648
+ " <table border=\"1\" class=\"dataframe\">\n",
1649
+ " <thead>\n",
1650
+ " <tr style=\"text-align: left;\">\n",
1651
+ " <th>Epoch</th>\n",
1652
+ " <th>Training Loss</th>\n",
1653
+ " <th>Validation Loss</th>\n",
1654
+ " </tr>\n",
1655
+ " </thead>\n",
1656
+ " <tbody>\n",
1657
+ " </tbody>\n",
1658
+ "</table><p>"
1659
+ ]
1660
+ },
1661
+ "metadata": {}
1662
+ },
1663
+ {
1664
+ "output_type": "error",
1665
+ "ename": "KeyboardInterrupt",
1666
+ "evalue": "",
1667
+ "traceback": [
1668
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
1669
+ "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
1670
+ "\u001b[0;32m/tmp/ipython-input-4032920361.py\u001b[0m in \u001b[0;36m<cell line: 0>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtrainer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
1671
+ "\u001b[0;32m/usr/local/lib/python3.12/dist-packages/transformers/trainer.py\u001b[0m in \u001b[0;36mtrain\u001b[0;34m(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)\u001b[0m\n\u001b[1;32m 2323\u001b[0m \u001b[0mhf_hub_utils\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0menable_progress_bars\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2324\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2325\u001b[0;31m return inner_training_loop(\n\u001b[0m\u001b[1;32m 2326\u001b[0m \u001b[0margs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2327\u001b[0m \u001b[0mresume_from_checkpoint\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mresume_from_checkpoint\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1672
+ "\u001b[0;32m/usr/local/lib/python3.12/dist-packages/transformers/trainer.py\u001b[0m in \u001b[0;36m_inner_training_loop\u001b[0;34m(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)\u001b[0m\n\u001b[1;32m 2672\u001b[0m )\n\u001b[1;32m 2673\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mcontext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2674\u001b[0;31m \u001b[0mtr_loss_step\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtraining_step\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnum_items_in_batch\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2675\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2676\u001b[0m if (\n",
1673
+ "\u001b[0;32m/usr/local/lib/python3.12/dist-packages/transformers/trainer.py\u001b[0m in \u001b[0;36mtraining_step\u001b[0;34m(***failed resolving arguments***)\u001b[0m\n\u001b[1;32m 4069\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"scale_wrt_gas\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4070\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 4071\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maccelerator\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mloss\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4072\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4073\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mloss\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdetach\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1674
+ "\u001b[0;32m/usr/local/lib/python3.12/dist-packages/accelerate/accelerator.py\u001b[0m in \u001b[0;36mbackward\u001b[0;34m(self, loss, **kwargs)\u001b[0m\n\u001b[1;32m 2738\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlomo_backward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mloss\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlearning_rate\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2739\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2740\u001b[0;31m \u001b[0mloss\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2741\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2742\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mset_trigger\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1675
+ "\u001b[0;32m/usr/local/lib/python3.12/dist-packages/torch/_tensor.py\u001b[0m in \u001b[0;36mbackward\u001b[0;34m(self, gradient, retain_graph, create_graph, inputs)\u001b[0m\n\u001b[1;32m 645\u001b[0m \u001b[0minputs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0minputs\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 646\u001b[0m )\n\u001b[0;32m--> 647\u001b[0;31m torch.autograd.backward(\n\u001b[0m\u001b[1;32m 648\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgradient\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minputs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0minputs\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 649\u001b[0m )\n",
1676
+ "\u001b[0;32m/usr/local/lib/python3.12/dist-packages/torch/autograd/__init__.py\u001b[0m in \u001b[0;36mbackward\u001b[0;34m(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)\u001b[0m\n\u001b[1;32m 352\u001b[0m \u001b[0;31m# some Python versions print out the first line of a multi-line function\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 353\u001b[0m \u001b[0;31m# calls in the traceback and some print out the last line\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 354\u001b[0;31m _engine_run_backward(\n\u001b[0m\u001b[1;32m 355\u001b[0m \u001b[0mtensors\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 356\u001b[0m \u001b[0mgrad_tensors_\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1677
+ "\u001b[0;32m/usr/local/lib/python3.12/dist-packages/torch/autograd/graph.py\u001b[0m in \u001b[0;36m_engine_run_backward\u001b[0;34m(t_outputs, *args, **kwargs)\u001b[0m\n\u001b[1;32m 827\u001b[0m \u001b[0munregister_hooks\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_register_logging_hooks_on_whole_graph\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mt_outputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 828\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 829\u001b[0;31m return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass\n\u001b[0m\u001b[1;32m 830\u001b[0m \u001b[0mt_outputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 831\u001b[0m ) # Calls into the C++ engine to run the backward pass\n",
1678
+ "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
1679
+ ]
1680
+ }
1681
+ ]
1682
+ },
1683
+ {
1684
+ "cell_type": "markdown",
1685
+ "source": [
1686
+ "## 9. Inference Example"
1687
+ ],
1688
+ "metadata": {
1689
+ "id": "7NiOt4SY8gjW"
1690
+ }
1691
+ },
1692
+ {
1693
+ "cell_type": "code",
1694
+ "source": [
1695
+ "text = \"Hello, how are you today? I hope you're doing well.\"\n",
1696
+ "\n",
1697
+ "inputs = tokenizer(text, return_tensors=\"pt\").to(model.device)\n",
1698
+ "generated_ids = model.generate(\n",
1699
+ " inputs[\"input_ids\"],\n",
1700
+ " max_length=128,\n",
1701
+ " num_beams=5,\n",
1702
+ " early_stopping=True\n",
1703
+ ")\n",
1704
+ "print(generated_ids)\n",
1705
+ "translation = tokenizer.decode(generated_ids[0], skip_special_tokens=True)\n",
1706
+ "print(\"English:\", text)\n",
1707
+ "print(\"German:\", translation)"
1708
+ ],
1709
+ "metadata": {
1710
+ "id": "YKCXr6Ap8gjW"
1711
+ },
1712
+ "execution_count": null,
1713
+ "outputs": []
1714
+ },
1715
+ {
1716
+ "cell_type": "markdown",
1717
+ "source": [
1718
+ "## 10. Save Model (Optional)"
1719
+ ],
1720
+ "metadata": {
1721
+ "id": "Y_nlerIE8gjW"
1722
+ }
1723
+ },
1724
+ {
1725
+ "cell_type": "code",
1726
+ "source": [
1727
+ "import os\n",
1728
+ "\n",
1729
+ "os.makedirs(final_dir, exist_ok=True)\n",
1730
+ "\n",
1731
+ "trainer.save_model(final_dir)\n",
1732
+ "tokenizer.save_pretrained(final_dir)\n",
1733
+ "print(f\"SAVED TO DRIVE: {final_dir}\")\n"
1734
+ ],
1735
+ "metadata": {
1736
+ "colab": {
1737
+ "base_uri": "https://localhost:8080/"
1738
+ },
1739
+ "id": "guMiVDeT8gjW",
1740
+ "outputId": "1ab58e4d-2dd6-441b-ef48-bec14a4a445e"
1741
+ },
1742
+ "execution_count": 34,
1743
+ "outputs": [
1744
+ {
1745
+ "output_type": "execute_result",
1746
+ "data": {
1747
+ "text/plain": [
1748
+ "('./mt5-en-de-finetuned/tokenizer_config.json',\n",
1749
+ " './mt5-en-de-finetuned/special_tokens_map.json',\n",
1750
+ " './mt5-en-de-finetuned/spiece.model',\n",
1751
+ " './mt5-en-de-finetuned/added_tokens.json')"
1752
+ ]
1753
+ },
1754
+ "metadata": {},
1755
+ "execution_count": 34
1756
+ }
1757
+ ]
1758
+ },
1759
+ {
1760
+ "cell_type": "markdown",
1761
+ "source": [
1762
+ "---\n",
1763
+ "\n",
1764
+ "**Done!** You now have a fine-tuned mT5 model for **English → German** translation.\n",
1765
+ "\n",
1766
+ "To adapt to **any other language**, just change:\n",
1767
+ "- `wmt16` → another dataset (e.g., `opus100`, `flores200`)\n",
1768
+ "- `source_lang`, `target_lang` keys\n",
1769
+ "- Dataset name in `load_dataset()`\n",
1770
+ "\n",
1771
+ "Let me know if you want a version for **low-resource languages** (e.g., Swahili, Quechua)!"
1772
+ ],
1773
+ "metadata": {
1774
+ "id": "4RwTey1F8gjW"
1775
+ }
1776
+ }
1777
+ ]
1778
+ }