fc1675c452f6ade17e19d10e66fc1dc8

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [de-it] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5109
  • Data Size: 1.0
  • Epoch Runtime: 107.4434
  • Bleu: 5.3511

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 18.2126 0 9.7229 0.2821
No log 1 684 17.1843 0.0078 10.4601 0.2857
No log 2 1368 15.8314 0.0156 11.3541 0.3103
No log 3 2052 12.9768 0.0312 13.2571 0.3516
No log 4 2736 9.6351 0.0625 16.1274 0.3873
9.9321 5 3420 6.6460 0.125 22.3163 0.2937
6.6798 6 4104 4.6406 0.25 34.2147 1.1370
5.324 7 4788 3.9731 0.5 58.2977 0.9879
4.4751 8.0 5472 3.4211 1.0 105.3376 1.7759
4.144 9.0 6156 3.2472 1.0 107.5736 2.2586
3.9925 10.0 6840 3.1491 1.0 105.1886 2.5878
3.8374 11.0 7524 3.0726 1.0 105.8710 2.8416
3.7373 12.0 8208 3.0195 1.0 107.2988 3.0379
3.607 13.0 8892 2.9684 1.0 106.8523 3.2369
3.6247 14.0 9576 2.9359 1.0 106.6629 3.3520
3.5038 15.0 10260 2.8978 1.0 107.2863 3.4596
3.4472 16.0 10944 2.8660 1.0 107.9750 3.5886
3.381 17.0 11628 2.8427 1.0 108.8808 3.6989
3.3438 18.0 12312 2.8194 1.0 107.1700 3.7762
3.3045 19.0 12996 2.7957 1.0 107.2615 3.8943
3.2534 20.0 13680 2.7769 1.0 108.6581 4.0035
3.1617 21.0 14364 2.7602 1.0 108.4630 4.0996
3.1425 22.0 15048 2.7414 1.0 106.6954 4.1840
3.1421 23.0 15732 2.7296 1.0 106.6434 4.2296
3.1 24.0 16416 2.7118 1.0 107.8798 4.2947
3.0595 25.0 17100 2.6938 1.0 110.3338 4.3791
3.0455 26.0 17784 2.6815 1.0 106.4931 4.4342
3.0074 27.0 18468 2.6724 1.0 106.8790 4.4725
2.9806 28.0 19152 2.6584 1.0 110.1350 4.5494
2.9533 29.0 19836 2.6477 1.0 109.3596 4.5771
2.8919 30.0 20520 2.6434 1.0 107.8245 4.6223
2.89 31.0 21204 2.6337 1.0 107.7325 4.6883
2.9254 32.0 21888 2.6207 1.0 108.0603 4.7041
2.8883 33.0 22572 2.6148 1.0 110.0230 4.7649
2.8092 34.0 23256 2.6023 1.0 106.6470 4.8163
2.8076 35.0 23940 2.5957 1.0 107.1131 4.8667
2.8102 36.0 24624 2.5849 1.0 107.4508 4.8860
2.74 37.0 25308 2.5849 1.0 108.6266 4.9398
2.754 38.0 25992 2.5701 1.0 107.4365 4.9580
2.6915 39.0 26676 2.5653 1.0 107.4098 4.9882
2.682 40.0 27360 2.5630 1.0 108.5210 5.0310
2.6995 41.0 28044 2.5529 1.0 108.1337 5.0781
2.7046 42.0 28728 2.5460 1.0 107.4749 5.1326
2.6707 43.0 29412 2.5435 1.0 108.1177 5.1529
2.6692 44.0 30096 2.5349 1.0 108.4466 5.1935
2.6215 45.0 30780 2.5305 1.0 108.6350 5.1652
2.6401 46.0 31464 2.5295 1.0 107.5707 5.2422
2.6027 47.0 32148 2.5177 1.0 108.3829 5.2745
2.583 48.0 32832 2.5178 1.0 108.8079 5.3087
2.5528 49.0 33516 2.5134 1.0 108.3599 5.3373
2.5637 50.0 34200 2.5109 1.0 107.4434 5.3511

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
3
Safetensors
Model size
0.6B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/fc1675c452f6ade17e19d10e66fc1dc8

Base model

google/umt5-small
Finetuned
(45)
this model

Evaluation results