iitb_punct_orig_finetuned_eng_Ltn_to_mar_Deva

This model is a fine-tuned version of ai4bharat/indictrans2-indic-indic-dist-320M on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
0.5705	0.3373	4000	0.5325	7.2659	20.7523
0.536	0.6746	8000	0.4951	7.9775	20.8722
0.4615	1.0119	12000	0.4757	8.1546	20.8717
0.4643	1.3492	16000	0.4606	8.4812	20.8716
0.4545	1.6865	20000	0.4496	8.6764	20.8743
0.4274	2.0238	24000	0.4421	8.7579	20.8725
0.4254	2.3611	28000	0.4341	8.937	20.8695
0.4089	2.6984	32000	0.4300	8.9973	20.8713
0.3813	3.0357	36000	0.4264	9.1282	20.8735
0.3794	3.3730	40000	0.4221	9.1568	20.8731
0.3887	3.7103	44000	0.4173	9.2067	20.8692
0.3416	4.0476	48000	0.4169	9.3934	20.8712
0.3581	4.3849	52000	0.4131	9.4104	20.8683
0.3596	4.7222	56000	0.4099	9.3756	20.8716
0.3244	5.0594	60000	0.4116	9.4521	20.872
0.3366	5.3967	64000	0.4085	9.4955	20.8652
0.3489	5.7340	68000	0.4056	9.4947	20.8741
0.3235	6.0713	72000	0.4075	9.4984	20.8682
0.3347	6.4086	76000	0.4054	9.5506	20.8708
0.328	6.7459	80000	0.4042	9.6531	20.8686
0.3169	7.0832	84000	0.4059	9.6254	20.8692
0.3137	7.4205	88000	0.4039	9.6441	20.8683
0.3063	7.7578	92000	0.4023	9.6607	20.8689
0.2955	8.0951	96000	0.4031	9.6606	20.8692
0.3082	8.4324	100000	0.4026	9.6643	20.8682
0.3133	8.7697	104000	0.4014	9.6642	20.8692
0.2981	9.1070	108000	0.4028	9.6811	20.8696
0.2857	9.4443	112000	0.4011	9.6914	20.8695
0.2868	9.7816	116000	0.4013	9.6977	20.8692

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

(3)

this model