0f5b17592027cb4e4a816d79557b1911

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [en-nl] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3695
  • Data Size: 1.0
  • Epoch Runtime: 152.9586
  • Bleu: 6.5964

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 15.9254 0 13.2639 0.2464
No log 1 966 15.1955 0.0078 14.4259 0.2560
No log 2 1932 13.7094 0.0156 15.9085 0.2756
0.3741 3 2898 11.4368 0.0312 18.2524 0.2662
12.1281 4 3864 7.6606 0.0625 22.6134 0.3354
8.1543 5 4830 5.4229 0.125 31.1433 0.7218
5.7585 6 5796 4.2528 0.25 48.2378 2.6981
4.7426 7 6762 3.6188 0.5 81.8670 2.1389
4.1624 8.0 7728 3.2714 1.0 150.3053 3.0444
3.9018 9.0 8694 3.1277 1.0 149.6269 3.4989
3.7533 10.0 9660 3.0298 1.0 149.3464 3.7639
3.5774 11.0 10626 2.9638 1.0 149.5916 4.0540
3.5013 12.0 11592 2.9118 1.0 147.9367 4.2137
3.3939 13.0 12558 2.8668 1.0 146.5442 4.3870
3.3578 14.0 13524 2.8287 1.0 149.3875 4.5341
3.2786 15.0 14490 2.7896 1.0 149.3572 4.6710
3.2392 16.0 15456 2.7568 1.0 148.9842 4.7764
3.1845 17.0 16422 2.7364 1.0 150.1884 4.8800
3.1296 18.0 17388 2.7068 1.0 149.8945 5.0243
3.1311 19.0 18354 2.6832 1.0 148.9657 5.1445
3.1132 20.0 19320 2.6625 1.0 149.4043 5.1978
3.0214 21.0 20286 2.6432 1.0 149.5657 5.2519
3.016 22.0 21252 2.6233 1.0 150.0480 5.3606
2.9664 23.0 22218 2.6075 1.0 150.8476 5.4040
2.9153 24.0 23184 2.5908 1.0 150.1463 5.5002
2.8783 25.0 24150 2.5782 1.0 148.1497 5.5732
2.8666 26.0 25116 2.5640 1.0 147.8534 5.6270
2.8334 27.0 26082 2.5496 1.0 148.1002 5.6465
2.8429 28.0 27048 2.5350 1.0 149.7204 5.7392
2.7873 29.0 28014 2.5227 1.0 148.0967 5.7758
2.7577 30.0 28980 2.5112 1.0 149.1964 5.8476
2.7046 31.0 29946 2.5076 1.0 150.6974 5.8756
2.7203 32.0 30912 2.4945 1.0 150.1795 5.9264
2.7432 33.0 31878 2.4899 1.0 148.6915 5.9398
2.6346 34.0 32844 2.4738 1.0 149.3140 6.0006
2.6742 35.0 33810 2.4651 1.0 150.2584 6.0440
2.6665 36.0 34776 2.4546 1.0 149.3136 6.1093
2.6591 37.0 35742 2.4435 1.0 149.4992 6.1513
2.6097 38.0 36708 2.4378 1.0 150.3728 6.1945
2.554 39.0 37674 2.4401 1.0 148.6280 6.2043
2.5815 40.0 38640 2.4343 1.0 149.8191 6.2463
2.5328 41.0 39606 2.4254 1.0 149.9701 6.2778
2.5702 42.0 40572 2.4145 1.0 149.9797 6.3117
2.5287 43.0 41538 2.4093 1.0 148.9173 6.3228
2.468 44.0 42504 2.4027 1.0 150.7585 6.4079
2.5006 45.0 43470 2.4004 1.0 150.8242 6.4404
2.4746 46.0 44436 2.3897 1.0 150.2094 6.4408
2.4475 47.0 45402 2.3881 1.0 150.0153 6.4717
2.4588 48.0 46368 2.3873 1.0 152.1674 6.5224
2.4383 49.0 47334 2.3768 1.0 150.8109 6.5423
2.4305 50.0 48300 2.3695 1.0 152.9586 6.5964

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/0f5b17592027cb4e4a816d79557b1911

Base model

google/umt5-small
Finetuned
(45)
this model

Evaluation results