1882e76605485bae427976f8e42669c2

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [de-fr] dataset. It achieves the following results on the evaluation set:

Loss: 2.1014
Data Size: 1.0
Epoch Runtime: 138.2293
Bleu: 7.0305

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	15.2819	0	12.1977	0.1157
No log	1	872	14.0570	0.0078	13.7404	0.1506
No log	2	1744	12.5783	0.0156	14.3323	0.1256
0.2768	3	2616	10.6469	0.0312	16.6890	0.1587
0.8402	4	3488	7.5471	0.0625	20.3658	0.2483
8.7522	5	4360	5.0443	0.125	28.8127	0.6140
5.6582	6	5232	3.9904	0.25	44.6548	1.9767
4.526	7	6104	3.3695	0.5	75.0160	1.6519
3.8898	8.0	6976	2.9352	1.0	138.2087	2.7400
3.6109	9.0	7848	2.7868	1.0	138.0720	3.2548
3.4636	10.0	8720	2.6965	1.0	140.5587	3.5898
3.3409	11.0	9592	2.6354	1.0	139.4747	3.8679
3.2077	12.0	10464	2.5838	1.0	139.4930	4.0654
3.1691	13.0	11336	2.5415	1.0	139.2214	4.3071
3.0689	14.0	12208	2.5128	1.0	139.4797	4.4992
3.01	15.0	13080	2.4731	1.0	139.5060	4.6686
2.9516	16.0	13952	2.4451	1.0	141.0948	4.8355
2.8595	17.0	14824	2.4169	1.0	140.2607	4.9649
2.8268	18.0	15696	2.3933	1.0	139.4680	5.0970
2.8319	19.0	16568	2.3745	1.0	141.9063	5.1924
2.7745	20.0	17440	2.3620	1.0	139.9242	5.3077
2.7716	21.0	18312	2.3411	1.0	141.6721	5.3964
2.7218	22.0	19184	2.3173	1.0	145.8132	5.5163
2.6581	23.0	20056	2.3053	1.0	145.6531	5.5892
2.6342	24.0	20928	2.2878	1.0	146.5871	5.6683
2.5885	25.0	21800	2.2814	1.0	146.2511	5.7737
2.5772	26.0	22672	2.2685	1.0	145.9510	5.8343
2.5563	27.0	23544	2.2579	1.0	144.9519	5.9332
2.5135	28.0	24416	2.2527	1.0	145.6331	6.0058
2.4737	29.0	25288	2.2346	1.0	145.9113	6.0698
2.509	30.0	26160	2.2268	1.0	144.7014	6.1406
2.473	31.0	27032	2.2114	1.0	144.6324	6.2047
2.4271	32.0	27904	2.2087	1.0	144.0634	6.2594
2.4121	33.0	28776	2.1971	1.0	144.3472	6.3215
2.3709	34.0	29648	2.1846	1.0	144.6133	6.3849
2.3713	35.0	30520	2.1893	1.0	144.6684	6.4302
2.3675	36.0	31392	2.1726	1.0	145.7501	6.4806
2.3349	37.0	32264	2.1665	1.0	144.5169	6.5537
2.3164	38.0	33136	2.1596	1.0	144.5251	6.5644
2.2996	39.0	34008	2.1523	1.0	144.7580	6.6312
2.2524	40.0	34880	2.1485	1.0	145.3244	6.6413
2.2727	41.0	35752	2.1414	1.0	145.4038	6.6935
2.239	42.0	36624	2.1388	1.0	143.3335	6.7214
2.2324	43.0	37496	2.1298	1.0	143.7957	6.7895
2.2502	44.0	38368	2.1236	1.0	144.5139	6.8473
2.2186	45.0	39240	2.1221	1.0	145.2851	6.8749
2.1839	46.0	40112	2.1160	1.0	139.0024	6.8615
2.1547	47.0	40984	2.1130	1.0	138.4339	6.9186
2.1398	48.0	41856	2.1049	1.0	138.8303	6.9564
2.151	49.0	42728	2.1017	1.0	137.8470	6.9989
2.0868	50.0	43600	2.1014	1.0	138.2293	7.0305

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 2

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/1882e76605485bae427976f8e42669c2

Base model

google/umt5-small

Finetuned

(45)

this model

contemmcm
/

1882e76605485bae427976f8e42669c2

1882e76605485bae427976f8e42669c2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for contemmcm/1882e76605485bae427976f8e42669c2

Evaluation results