1882e76605485bae427976f8e42669c2

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [de-fr] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1014
  • Data Size: 1.0
  • Epoch Runtime: 138.2293
  • Bleu: 7.0305

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 15.2819 0 12.1977 0.1157
No log 1 872 14.0570 0.0078 13.7404 0.1506
No log 2 1744 12.5783 0.0156 14.3323 0.1256
0.2768 3 2616 10.6469 0.0312 16.6890 0.1587
0.8402 4 3488 7.5471 0.0625 20.3658 0.2483
8.7522 5 4360 5.0443 0.125 28.8127 0.6140
5.6582 6 5232 3.9904 0.25 44.6548 1.9767
4.526 7 6104 3.3695 0.5 75.0160 1.6519
3.8898 8.0 6976 2.9352 1.0 138.2087 2.7400
3.6109 9.0 7848 2.7868 1.0 138.0720 3.2548
3.4636 10.0 8720 2.6965 1.0 140.5587 3.5898
3.3409 11.0 9592 2.6354 1.0 139.4747 3.8679
3.2077 12.0 10464 2.5838 1.0 139.4930 4.0654
3.1691 13.0 11336 2.5415 1.0 139.2214 4.3071
3.0689 14.0 12208 2.5128 1.0 139.4797 4.4992
3.01 15.0 13080 2.4731 1.0 139.5060 4.6686
2.9516 16.0 13952 2.4451 1.0 141.0948 4.8355
2.8595 17.0 14824 2.4169 1.0 140.2607 4.9649
2.8268 18.0 15696 2.3933 1.0 139.4680 5.0970
2.8319 19.0 16568 2.3745 1.0 141.9063 5.1924
2.7745 20.0 17440 2.3620 1.0 139.9242 5.3077
2.7716 21.0 18312 2.3411 1.0 141.6721 5.3964
2.7218 22.0 19184 2.3173 1.0 145.8132 5.5163
2.6581 23.0 20056 2.3053 1.0 145.6531 5.5892
2.6342 24.0 20928 2.2878 1.0 146.5871 5.6683
2.5885 25.0 21800 2.2814 1.0 146.2511 5.7737
2.5772 26.0 22672 2.2685 1.0 145.9510 5.8343
2.5563 27.0 23544 2.2579 1.0 144.9519 5.9332
2.5135 28.0 24416 2.2527 1.0 145.6331 6.0058
2.4737 29.0 25288 2.2346 1.0 145.9113 6.0698
2.509 30.0 26160 2.2268 1.0 144.7014 6.1406
2.473 31.0 27032 2.2114 1.0 144.6324 6.2047
2.4271 32.0 27904 2.2087 1.0 144.0634 6.2594
2.4121 33.0 28776 2.1971 1.0 144.3472 6.3215
2.3709 34.0 29648 2.1846 1.0 144.6133 6.3849
2.3713 35.0 30520 2.1893 1.0 144.6684 6.4302
2.3675 36.0 31392 2.1726 1.0 145.7501 6.4806
2.3349 37.0 32264 2.1665 1.0 144.5169 6.5537
2.3164 38.0 33136 2.1596 1.0 144.5251 6.5644
2.2996 39.0 34008 2.1523 1.0 144.7580 6.6312
2.2524 40.0 34880 2.1485 1.0 145.3244 6.6413
2.2727 41.0 35752 2.1414 1.0 145.4038 6.6935
2.239 42.0 36624 2.1388 1.0 143.3335 6.7214
2.2324 43.0 37496 2.1298 1.0 143.7957 6.7895
2.2502 44.0 38368 2.1236 1.0 144.5139 6.8473
2.2186 45.0 39240 2.1221 1.0 145.2851 6.8749
2.1839 46.0 40112 2.1160 1.0 139.0024 6.8615
2.1547 47.0 40984 2.1130 1.0 138.4339 6.9186
2.1398 48.0 41856 2.1049 1.0 138.8303 6.9564
2.151 49.0 42728 2.1017 1.0 137.8470 6.9989
2.0868 50.0 43600 2.1014 1.0 138.2293 7.0305

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/1882e76605485bae427976f8e42669c2

Base model

google/umt5-small
Finetuned
(45)
this model

Evaluation results