d8d8fd3ea3376c3aa3c9f3b5ae367e4d

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [en-fr] dataset. It achieves the following results on the evaluation set:

Loss: 1.6234
Data Size: 1.0
Epoch Runtime: 502.3375
Bleu: 11.6107

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	15.1108	0	40.9449	0.1801
No log	1	3177	12.2470	0.0078	44.7462	0.1632
0.2482	2	6354	8.6497	0.0156	49.0083	0.2088
8.6722	3	9531	5.9141	0.0312	56.0694	0.3121
5.9354	4	12708	3.9223	0.0625	69.6133	3.2295
4.5942	5	15885	3.3008	0.125	98.3453	2.8829
3.8614	6	19062	2.8293	0.25	156.2967	4.0112
3.3935	7	22239	2.5678	0.5	266.4938	5.1411
3.0488	8.0	25416	2.3702	1.0	512.2830	6.2382
2.8234	9.0	28593	2.2452	1.0	518.0117	6.9661
2.7209	10.0	31770	2.1731	1.0	515.0344	7.4730
2.6035	11.0	34947	2.1031	1.0	521.1200	7.9125
2.5055	12.0	38124	2.0647	1.0	517.4778	8.2337
2.4309	13.0	41301	2.0198	1.0	516.4580	8.5100
2.3719	14.0	44478	1.9804	1.0	515.0311	8.7866
2.3223	15.0	47655	1.9540	1.0	520.2747	8.9727
2.2457	16.0	50832	1.9312	1.0	516.9345	9.1736
2.2272	17.0	54009	1.9029	1.0	516.1580	9.3535
2.2188	18.0	57186	1.8812	1.0	518.6792	9.5288
2.1583	19.0	60363	1.8677	1.0	519.5041	9.6595
2.0955	20.0	63540	1.8466	1.0	522.0993	9.7797
2.0809	21.0	66717	1.8308	1.0	507.6019	9.9145
2.0634	22.0	69894	1.8122	1.0	499.3597	10.0769
2.0399	23.0	73071	1.8028	1.0	502.4851	10.1335
2.0418	24.0	76248	1.7894	1.0	503.3499	10.2583
2.0029	25.0	79425	1.7749	1.0	503.2419	10.3534
1.9805	26.0	82602	1.7636	1.0	500.1708	10.4152
1.9643	27.0	85779	1.7547	1.0	501.3360	10.5282
1.9555	28.0	88956	1.7424	1.0	502.3105	10.5978
1.9327	29.0	92133	1.7414	1.0	502.9374	10.6517
1.9168	30.0	95310	1.7332	1.0	502.5499	10.6957
1.8928	31.0	98487	1.7253	1.0	500.4714	10.7808
1.8703	32.0	101664	1.7154	1.0	506.9298	10.8111
1.8468	33.0	104841	1.7063	1.0	503.4817	10.9205
1.8673	34.0	108018	1.7023	1.0	505.0646	11.0093
1.8186	35.0	111195	1.6962	1.0	502.6725	11.0144
1.7871	36.0	114372	1.6844	1.0	502.7248	11.0728
1.8113	37.0	117549	1.6838	1.0	505.2796	11.1395
1.7836	38.0	120726	1.6791	1.0	501.3685	11.1425
1.7603	39.0	123903	1.6714	1.0	502.4841	11.2427
1.7565	40.0	127080	1.6582	1.0	501.6685	11.2670
1.7209	41.0	130257	1.6633	1.0	508.2311	11.3440
1.7196	42.0	133434	1.6549	1.0	502.7122	11.3669
1.7148	43.0	136611	1.6542	1.0	499.5352	11.3845
1.6946	44.0	139788	1.6517	1.0	504.2522	11.4227
1.6908	45.0	142965	1.6450	1.0	501.0214	11.4683
1.6435	46.0	146142	1.6402	1.0	500.5124	11.5187
1.6486	47.0	149319	1.6317	1.0	503.3988	11.5462
1.6063	48.0	152496	1.6335	1.0	503.3127	11.5403
1.6474	49.0	155673	1.6255	1.0	503.6295	11.6011
1.6467	50.0	158850	1.6234	1.0	502.3375	11.6107

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/d8d8fd3ea3376c3aa3c9f3b5ae367e4d

Base model

google/umt5-small

Finetuned

(45)

this model

contemmcm
/

d8d8fd3ea3376c3aa3c9f3b5ae367e4d

d8d8fd3ea3376c3aa3c9f3b5ae367e4d

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for contemmcm/d8d8fd3ea3376c3aa3c9f3b5ae367e4d

Evaluation results