64264194fc2a4f42ecc1f95eec56730e

This model is a fine-tuned version of google/mt5-xl on the Helsinki-NLP/opus_books [en-pt] dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9175
  • Data Size: 1.0
  • Epoch Runtime: 39.3368
  • Bleu: 18.8538

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 6.7874 0 1.8337 0.0492
No log 1 35 5.7017 0.0078 2.5081 0.0714
No log 2 70 4.2507 0.0156 6.4993 0.1528
No log 3 105 3.2339 0.0312 14.5468 0.4578
No log 4 140 2.5033 0.0625 22.9918 1.0598
No log 5 175 2.0593 0.125 28.6541 1.6707
No log 6 210 1.6270 0.25 22.2383 2.1778
No log 7 245 1.1479 0.5 25.1250 3.0213
0.5296 8.0 280 0.9210 1.0 41.2080 15.8497
1.2825 9.0 315 0.8598 1.0 42.7891 16.0894
0.9729 10.0 350 0.8443 1.0 30.9760 16.9181
0.9729 11.0 385 0.8413 1.0 33.6040 16.8035
0.7696 12.0 420 0.8547 1.0 38.2172 17.3521
0.6304 13.0 455 0.8768 1.0 31.0604 18.1505
0.6304 14.0 490 0.9103 1.0 33.5337 17.0979
0.5147 15.0 525 0.9175 1.0 39.3368 18.8538

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
4
Safetensors
Model size
0.9B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/64264194fc2a4f42ecc1f95eec56730e

Base model

google/mt5-xl
Finetuned
(40)
this model

Evaluation results