Cseti's picture
Update README.md
80a2efc verified
---
base_model:
- aoi-ot/VibeVoice-Large
tags:
- text-to-speech
- tts
- lora
- vibevice
datasets:
- mozilla-foundation/common_voice_17_0
language:
- hu
---
# VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17
This is a VibeVoice 7B (Large) model LoRA finetune on a Hungarian audio dataset.
For this particular test I used the CommonVoice 17.0 dataset's Hungarian config's train split.
To finetune the model I used the [following code base](https://github.com/voicepowered-ai/VibeVoice-finetuning).
Thank you for [JPGallegoar](https://github.com/jpgallegoar-vpai) for that amazing VibeVoice trainer!
## Inference
To use the LoRA model you can use [my modified fork](https://github.com/cseti007/VibeVoice)
until the [following PR](https://github.com/vibevoice-community/VibeVoice/pull/6)
will be merged into the main branch of [VibeVoice Community's repository](https://github.com/vibevoice-community/VibeVoice).
## Examples
**Voice without LoRA**
<div style="display: flex; gap: 20px;">
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s42_nolora-1.wav"></audio>
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s98765_nolora-1.wav"></audio>
</div>
**Voice WITH LoRA**
<div style="display: flex; gap: 20px;">
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_hu-lora_srand3.wav"></audio>
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s42_hu-lora-1.wav"></audio>
</div>