iou-chapter-audio-dataset-force-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4800

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 3407
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 4000
  • training_steps: 40000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.5387 5.2918 1000 0.5152
0.493 10.5836 2000 0.4955
0.4935 15.8753 3000 0.4885
0.4846 21.1645 4000 0.4863
0.4717 26.4562 5000 0.4825
0.4532 31.7480 6000 0.4804
0.4841 37.0371 7000 0.4802
0.458 42.3289 8000 0.4791
0.4454 47.6207 9000 0.4822
0.4461 52.9125 10000 0.4790
0.4362 58.2016 11000 0.4789
0.4301 63.4934 12000 0.4789
0.43 68.7851 13000 0.4806
0.4392 74.0743 14000 0.4796
0.4355 79.3660 15000 0.4797
0.4273 84.6578 16000 0.4778
0.4324 89.9496 17000 0.4808
0.4239 95.2387 18000 0.4792
0.4174 100.5305 19000 0.4786
0.4206 105.8223 20000 0.4777
0.4104 111.1114 21000 0.4784
0.4121 116.4032 22000 0.4797
0.4087 121.6950 23000 0.4800
0.4115 126.9867 24000 0.4788
0.405 132.2759 25000 0.4799
0.4091 137.5676 26000 0.4795
0.4165 142.8594 27000 0.4799
0.4059 148.1485 28000 0.4792
0.4092 153.4403 29000 0.4797
0.4006 158.7321 30000 0.4791
0.4033 164.0212 31000 0.4789
0.3929 169.3130 32000 0.4796
0.4024 174.6048 33000 0.4803
0.3988 179.8966 34000 0.4785
0.3965 185.1857 35000 0.4792
0.3914 190.4775 36000 0.4795
0.3967 195.7692 37000 0.4811
0.3994 201.0584 38000 0.4800
0.4019 206.3501 39000 0.4805
0.4005 211.6419 40000 0.4800

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
293
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sil-ai/iou-chapter-audio-dataset-force-aligned-speecht5

Finetuned
(1259)
this model