d8d8fd3ea3376c3aa3c9f3b5ae367e4d

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [en-fr] dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6234
  • Data Size: 1.0
  • Epoch Runtime: 502.3375
  • Bleu: 11.6107

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 15.1108 0 40.9449 0.1801
No log 1 3177 12.2470 0.0078 44.7462 0.1632
0.2482 2 6354 8.6497 0.0156 49.0083 0.2088
8.6722 3 9531 5.9141 0.0312 56.0694 0.3121
5.9354 4 12708 3.9223 0.0625 69.6133 3.2295
4.5942 5 15885 3.3008 0.125 98.3453 2.8829
3.8614 6 19062 2.8293 0.25 156.2967 4.0112
3.3935 7 22239 2.5678 0.5 266.4938 5.1411
3.0488 8.0 25416 2.3702 1.0 512.2830 6.2382
2.8234 9.0 28593 2.2452 1.0 518.0117 6.9661
2.7209 10.0 31770 2.1731 1.0 515.0344 7.4730
2.6035 11.0 34947 2.1031 1.0 521.1200 7.9125
2.5055 12.0 38124 2.0647 1.0 517.4778 8.2337
2.4309 13.0 41301 2.0198 1.0 516.4580 8.5100
2.3719 14.0 44478 1.9804 1.0 515.0311 8.7866
2.3223 15.0 47655 1.9540 1.0 520.2747 8.9727
2.2457 16.0 50832 1.9312 1.0 516.9345 9.1736
2.2272 17.0 54009 1.9029 1.0 516.1580 9.3535
2.2188 18.0 57186 1.8812 1.0 518.6792 9.5288
2.1583 19.0 60363 1.8677 1.0 519.5041 9.6595
2.0955 20.0 63540 1.8466 1.0 522.0993 9.7797
2.0809 21.0 66717 1.8308 1.0 507.6019 9.9145
2.0634 22.0 69894 1.8122 1.0 499.3597 10.0769
2.0399 23.0 73071 1.8028 1.0 502.4851 10.1335
2.0418 24.0 76248 1.7894 1.0 503.3499 10.2583
2.0029 25.0 79425 1.7749 1.0 503.2419 10.3534
1.9805 26.0 82602 1.7636 1.0 500.1708 10.4152
1.9643 27.0 85779 1.7547 1.0 501.3360 10.5282
1.9555 28.0 88956 1.7424 1.0 502.3105 10.5978
1.9327 29.0 92133 1.7414 1.0 502.9374 10.6517
1.9168 30.0 95310 1.7332 1.0 502.5499 10.6957
1.8928 31.0 98487 1.7253 1.0 500.4714 10.7808
1.8703 32.0 101664 1.7154 1.0 506.9298 10.8111
1.8468 33.0 104841 1.7063 1.0 503.4817 10.9205
1.8673 34.0 108018 1.7023 1.0 505.0646 11.0093
1.8186 35.0 111195 1.6962 1.0 502.6725 11.0144
1.7871 36.0 114372 1.6844 1.0 502.7248 11.0728
1.8113 37.0 117549 1.6838 1.0 505.2796 11.1395
1.7836 38.0 120726 1.6791 1.0 501.3685 11.1425
1.7603 39.0 123903 1.6714 1.0 502.4841 11.2427
1.7565 40.0 127080 1.6582 1.0 501.6685 11.2670
1.7209 41.0 130257 1.6633 1.0 508.2311 11.3440
1.7196 42.0 133434 1.6549 1.0 502.7122 11.3669
1.7148 43.0 136611 1.6542 1.0 499.5352 11.3845
1.6946 44.0 139788 1.6517 1.0 504.2522 11.4227
1.6908 45.0 142965 1.6450 1.0 501.0214 11.4683
1.6435 46.0 146142 1.6402 1.0 500.5124 11.5187
1.6486 47.0 149319 1.6317 1.0 503.3988 11.5462
1.6063 48.0 152496 1.6335 1.0 503.3127 11.5403
1.6474 49.0 155673 1.6255 1.0 503.6295 11.6011
1.6467 50.0 158850 1.6234 1.0 502.3375 11.6107

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/d8d8fd3ea3376c3aa3c9f3b5ae367e4d

Base model

google/umt5-small
Finetuned
(45)
this model

Evaluation results