File size: 1,681 Bytes
ae0e6ac 10ae90f ae0e6ac 10ae90f ae0e6ac 10ae90f ae0e6ac d17f1a3 05871e2 d17f1a3 bec87fb 80a2efc bec87fb 80a2efc bec87fb 80a2efc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
---
base_model:
- aoi-ot/VibeVoice-Large
tags:
- text-to-speech
- tts
- lora
- vibevice
datasets:
- mozilla-foundation/common_voice_17_0
language:
- hu
---
# VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17
This is a VibeVoice 7B (Large) model LoRA finetune on a Hungarian audio dataset.
For this particular test I used the CommonVoice 17.0 dataset's Hungarian config's train split.
To finetune the model I used the [following code base](https://github.com/voicepowered-ai/VibeVoice-finetuning).
Thank you for [JPGallegoar](https://github.com/jpgallegoar-vpai) for that amazing VibeVoice trainer!
## Inference
To use the LoRA model you can use [my modified fork](https://github.com/cseti007/VibeVoice)
until the [following PR](https://github.com/vibevoice-community/VibeVoice/pull/6)
will be merged into the main branch of [VibeVoice Community's repository](https://github.com/vibevoice-community/VibeVoice).
## Examples
**Voice without LoRA**
<div style="display: flex; gap: 20px;">
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s42_nolora-1.wav"></audio>
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s98765_nolora-1.wav"></audio>
</div>
**Voice WITH LoRA**
<div style="display: flex; gap: 20px;">
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_hu-lora_srand3.wav"></audio>
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s42_hu-lora-1.wav"></audio>
</div> |