edcdeeeb6ee9c1caa2b5ad69566c650c

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [en-pt] dataset. It achieves the following results on the evaluation set:

Loss: 2.0008
Data Size: 1.0
Epoch Runtime: 8.8101
Bleu: 10.6357

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	16.2919	0	1.2201	0.4835
No log	1	35	16.1880	0.0078	1.5263	0.7499
No log	2	70	16.0658	0.0156	1.7106	0.7946
No log	3	105	15.9703	0.0312	2.8809	0.6101
No log	4	140	15.9027	0.0625	3.6011	0.9109
No log	5	175	15.7632	0.125	4.1938	0.7681
No log	6	210	14.8865	0.25	4.6713	0.4877
No log	7	245	13.5589	0.5	6.0358	0.8687
3.2337	8.0	280	10.5702	1.0	9.2415	0.5636
13.3707	9.0	315	8.2162	1.0	8.6928	0.4153
10.0817	10.0	350	6.7948	1.0	8.6704	0.3472
10.0817	11.0	385	5.6901	1.0	8.9605	1.1900
8.1482	12.0	420	4.7732	1.0	8.9692	2.1952
7.0952	13.0	455	4.2501	1.0	9.7834	2.3122
7.0952	14.0	490	3.9309	1.0	6.8089	4.1742
6.3654	15.0	525	3.7403	1.0	7.1984	4.8892
5.828	16.0	560	3.5575	1.0	7.2448	6.9155
5.828	17.0	595	3.4114	1.0	7.5136	6.3853
5.4176	18.0	630	3.2819	1.0	7.4949	3.7225
5.0557	19.0	665	3.1687	1.0	7.4330	3.2310
4.7981	20.0	700	3.0727	1.0	7.4278	3.3432
4.7981	21.0	735	2.9720	1.0	7.5395	3.6462
4.5602	22.0	770	2.8854	1.0	7.9688	3.8492
4.3773	23.0	805	2.8101	1.0	7.9674	4.0627
4.3773	24.0	840	2.7270	1.0	8.0725	3.1310
4.1677	25.0	875	2.6455	1.0	8.4481	2.6736
4.0148	26.0	910	2.5817	1.0	8.9851	2.7965
4.0148	27.0	945	2.5152	1.0	6.7198	2.8878
3.842	28.0	980	2.4614	1.0	6.6934	2.9649
3.6842	29.0	1015	2.4123	1.0	6.7499	3.2196
3.6065	30.0	1050	2.3650	1.0	6.7672	12.7242
3.6065	31.0	1085	2.3169	1.0	7.2906	19.2524
3.4403	32.0	1120	2.2841	1.0	7.6503	19.2987
3.3772	33.0	1155	2.2433	1.0	7.8124	11.8892
3.3772	34.0	1190	2.2220	1.0	7.5171	9.8309
3.2671	35.0	1225	2.1913	1.0	7.3929	9.5611
3.1826	36.0	1260	2.1712	1.0	7.9977	9.6651
3.1826	37.0	1295	2.1564	1.0	7.7877	9.7081
3.1156	38.0	1330	2.1336	1.0	7.8609	9.7492
3.0241	39.0	1365	2.1186	1.0	8.5171	9.7944
2.963	40.0	1400	2.1051	1.0	7.1898	9.8516
2.963	41.0	1435	2.0857	1.0	7.4999	9.9685
2.903	42.0	1470	2.0740	1.0	7.7822	10.1261
2.8212	43.0	1505	2.0663	1.0	7.8828	10.2682
2.8212	44.0	1540	2.0470	1.0	7.9964	10.3864
2.7466	45.0	1575	2.0456	1.0	8.0995	10.3145
2.7288	46.0	1610	2.0331	1.0	7.9713	10.5252
2.7288	47.0	1645	2.0220	1.0	8.2568	10.4841
2.6657	48.0	1680	2.0138	1.0	8.1586	10.5815
2.6303	49.0	1715	2.0024	1.0	8.3825	10.6827
2.562	50.0	1750	2.0008	1.0	8.8101	10.6357

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 3

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/edcdeeeb6ee9c1caa2b5ad69566c650c

Base model

google/umt5-small

Finetuned

(45)

this model

contemmcm
/

edcdeeeb6ee9c1caa2b5ad69566c650c

edcdeeeb6ee9c1caa2b5ad69566c650c

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for contemmcm/edcdeeeb6ee9c1caa2b5ad69566c650c

Evaluation results