iou-chapter-audio-dataset-force-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 4000
training_steps: 40000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.5387	5.2918	1000	0.5152
0.493	10.5836	2000	0.4955
0.4935	15.8753	3000	0.4885
0.4846	21.1645	4000	0.4863
0.4717	26.4562	5000	0.4825
0.4532	31.7480	6000	0.4804
0.4841	37.0371	7000	0.4802
0.458	42.3289	8000	0.4791
0.4454	47.6207	9000	0.4822
0.4461	52.9125	10000	0.4790
0.4362	58.2016	11000	0.4789
0.4301	63.4934	12000	0.4789
0.43	68.7851	13000	0.4806
0.4392	74.0743	14000	0.4796
0.4355	79.3660	15000	0.4797
0.4273	84.6578	16000	0.4778
0.4324	89.9496	17000	0.4808
0.4239	95.2387	18000	0.4792
0.4174	100.5305	19000	0.4786
0.4206	105.8223	20000	0.4777
0.4104	111.1114	21000	0.4784
0.4121	116.4032	22000	0.4797
0.4087	121.6950	23000	0.4800
0.4115	126.9867	24000	0.4788
0.405	132.2759	25000	0.4799
0.4091	137.5676	26000	0.4795
0.4165	142.8594	27000	0.4799
0.4059	148.1485	28000	0.4792
0.4092	153.4403	29000	0.4797
0.4006	158.7321	30000	0.4791
0.4033	164.0212	31000	0.4789
0.3929	169.3130	32000	0.4796
0.4024	174.6048	33000	0.4803
0.3988	179.8966	34000	0.4785
0.3965	185.1857	35000	0.4792
0.3914	190.4775	36000	0.4795
0.3967	195.7692	37000	0.4811
0.3994	201.0584	38000	0.4800
0.4019	206.3501	39000	0.4805
0.4005	211.6419	40000	0.4800

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

(1259)

this model