edcdeeeb6ee9c1caa2b5ad69566c650c

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [en-pt] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0008
  • Data Size: 1.0
  • Epoch Runtime: 8.8101
  • Bleu: 10.6357

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 16.2919 0 1.2201 0.4835
No log 1 35 16.1880 0.0078 1.5263 0.7499
No log 2 70 16.0658 0.0156 1.7106 0.7946
No log 3 105 15.9703 0.0312 2.8809 0.6101
No log 4 140 15.9027 0.0625 3.6011 0.9109
No log 5 175 15.7632 0.125 4.1938 0.7681
No log 6 210 14.8865 0.25 4.6713 0.4877
No log 7 245 13.5589 0.5 6.0358 0.8687
3.2337 8.0 280 10.5702 1.0 9.2415 0.5636
13.3707 9.0 315 8.2162 1.0 8.6928 0.4153
10.0817 10.0 350 6.7948 1.0 8.6704 0.3472
10.0817 11.0 385 5.6901 1.0 8.9605 1.1900
8.1482 12.0 420 4.7732 1.0 8.9692 2.1952
7.0952 13.0 455 4.2501 1.0 9.7834 2.3122
7.0952 14.0 490 3.9309 1.0 6.8089 4.1742
6.3654 15.0 525 3.7403 1.0 7.1984 4.8892
5.828 16.0 560 3.5575 1.0 7.2448 6.9155
5.828 17.0 595 3.4114 1.0 7.5136 6.3853
5.4176 18.0 630 3.2819 1.0 7.4949 3.7225
5.0557 19.0 665 3.1687 1.0 7.4330 3.2310
4.7981 20.0 700 3.0727 1.0 7.4278 3.3432
4.7981 21.0 735 2.9720 1.0 7.5395 3.6462
4.5602 22.0 770 2.8854 1.0 7.9688 3.8492
4.3773 23.0 805 2.8101 1.0 7.9674 4.0627
4.3773 24.0 840 2.7270 1.0 8.0725 3.1310
4.1677 25.0 875 2.6455 1.0 8.4481 2.6736
4.0148 26.0 910 2.5817 1.0 8.9851 2.7965
4.0148 27.0 945 2.5152 1.0 6.7198 2.8878
3.842 28.0 980 2.4614 1.0 6.6934 2.9649
3.6842 29.0 1015 2.4123 1.0 6.7499 3.2196
3.6065 30.0 1050 2.3650 1.0 6.7672 12.7242
3.6065 31.0 1085 2.3169 1.0 7.2906 19.2524
3.4403 32.0 1120 2.2841 1.0 7.6503 19.2987
3.3772 33.0 1155 2.2433 1.0 7.8124 11.8892
3.3772 34.0 1190 2.2220 1.0 7.5171 9.8309
3.2671 35.0 1225 2.1913 1.0 7.3929 9.5611
3.1826 36.0 1260 2.1712 1.0 7.9977 9.6651
3.1826 37.0 1295 2.1564 1.0 7.7877 9.7081
3.1156 38.0 1330 2.1336 1.0 7.8609 9.7492
3.0241 39.0 1365 2.1186 1.0 8.5171 9.7944
2.963 40.0 1400 2.1051 1.0 7.1898 9.8516
2.963 41.0 1435 2.0857 1.0 7.4999 9.9685
2.903 42.0 1470 2.0740 1.0 7.7822 10.1261
2.8212 43.0 1505 2.0663 1.0 7.8828 10.2682
2.8212 44.0 1540 2.0470 1.0 7.9964 10.3864
2.7466 45.0 1575 2.0456 1.0 8.0995 10.3145
2.7288 46.0 1610 2.0331 1.0 7.9713 10.5252
2.7288 47.0 1645 2.0220 1.0 8.2568 10.4841
2.6657 48.0 1680 2.0138 1.0 8.1586 10.5815
2.6303 49.0 1715 2.0024 1.0 8.3825 10.6827
2.562 50.0 1750 2.0008 1.0 8.8101 10.6357

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
3
Safetensors
Model size
0.6B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/edcdeeeb6ee9c1caa2b5ad69566c650c

Base model

google/umt5-small
Finetuned
(45)
this model

Evaluation results