0f5b17592027cb4e4a816d79557b1911

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [en-nl] dataset. It achieves the following results on the evaluation set:

Loss: 2.3695
Data Size: 1.0
Epoch Runtime: 152.9586
Bleu: 6.5964

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	15.9254	0	13.2639	0.2464
No log	1	966	15.1955	0.0078	14.4259	0.2560
No log	2	1932	13.7094	0.0156	15.9085	0.2756
0.3741	3	2898	11.4368	0.0312	18.2524	0.2662
12.1281	4	3864	7.6606	0.0625	22.6134	0.3354
8.1543	5	4830	5.4229	0.125	31.1433	0.7218
5.7585	6	5796	4.2528	0.25	48.2378	2.6981
4.7426	7	6762	3.6188	0.5	81.8670	2.1389
4.1624	8.0	7728	3.2714	1.0	150.3053	3.0444
3.9018	9.0	8694	3.1277	1.0	149.6269	3.4989
3.7533	10.0	9660	3.0298	1.0	149.3464	3.7639
3.5774	11.0	10626	2.9638	1.0	149.5916	4.0540
3.5013	12.0	11592	2.9118	1.0	147.9367	4.2137
3.3939	13.0	12558	2.8668	1.0	146.5442	4.3870
3.3578	14.0	13524	2.8287	1.0	149.3875	4.5341
3.2786	15.0	14490	2.7896	1.0	149.3572	4.6710
3.2392	16.0	15456	2.7568	1.0	148.9842	4.7764
3.1845	17.0	16422	2.7364	1.0	150.1884	4.8800
3.1296	18.0	17388	2.7068	1.0	149.8945	5.0243
3.1311	19.0	18354	2.6832	1.0	148.9657	5.1445
3.1132	20.0	19320	2.6625	1.0	149.4043	5.1978
3.0214	21.0	20286	2.6432	1.0	149.5657	5.2519
3.016	22.0	21252	2.6233	1.0	150.0480	5.3606
2.9664	23.0	22218	2.6075	1.0	150.8476	5.4040
2.9153	24.0	23184	2.5908	1.0	150.1463	5.5002
2.8783	25.0	24150	2.5782	1.0	148.1497	5.5732
2.8666	26.0	25116	2.5640	1.0	147.8534	5.6270
2.8334	27.0	26082	2.5496	1.0	148.1002	5.6465
2.8429	28.0	27048	2.5350	1.0	149.7204	5.7392
2.7873	29.0	28014	2.5227	1.0	148.0967	5.7758
2.7577	30.0	28980	2.5112	1.0	149.1964	5.8476
2.7046	31.0	29946	2.5076	1.0	150.6974	5.8756
2.7203	32.0	30912	2.4945	1.0	150.1795	5.9264
2.7432	33.0	31878	2.4899	1.0	148.6915	5.9398
2.6346	34.0	32844	2.4738	1.0	149.3140	6.0006
2.6742	35.0	33810	2.4651	1.0	150.2584	6.0440
2.6665	36.0	34776	2.4546	1.0	149.3136	6.1093
2.6591	37.0	35742	2.4435	1.0	149.4992	6.1513
2.6097	38.0	36708	2.4378	1.0	150.3728	6.1945
2.554	39.0	37674	2.4401	1.0	148.6280	6.2043
2.5815	40.0	38640	2.4343	1.0	149.8191	6.2463
2.5328	41.0	39606	2.4254	1.0	149.9701	6.2778
2.5702	42.0	40572	2.4145	1.0	149.9797	6.3117
2.5287	43.0	41538	2.4093	1.0	148.9173	6.3228
2.468	44.0	42504	2.4027	1.0	150.7585	6.4079
2.5006	45.0	43470	2.4004	1.0	150.8242	6.4404
2.4746	46.0	44436	2.3897	1.0	150.2094	6.4408
2.4475	47.0	45402	2.3881	1.0	150.0153	6.4717
2.4588	48.0	46368	2.3873	1.0	152.1674	6.5224
2.4383	49.0	47334	2.3768	1.0	150.8109	6.5423
2.4305	50.0	48300	2.3695	1.0	152.9586	6.5964

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/0f5b17592027cb4e4a816d79557b1911

Base model

google/umt5-small

Finetuned

(45)

this model

contemmcm
/

0f5b17592027cb4e4a816d79557b1911

0f5b17592027cb4e4a816d79557b1911

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for contemmcm/0f5b17592027cb4e4a816d79557b1911

Evaluation results