|
|
--- |
|
|
base_model: |
|
|
- aoi-ot/VibeVoice-Large |
|
|
tags: |
|
|
- text-to-speech |
|
|
- tts |
|
|
- lora |
|
|
- vibevice |
|
|
datasets: |
|
|
- mozilla-foundation/common_voice_17_0 |
|
|
language: |
|
|
- hu |
|
|
--- |
|
|
# VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17 |
|
|
This is a VibeVoice 7B (Large) model LoRA finetune on a Hungarian audio dataset. |
|
|
For this particular test I used the CommonVoice 17.0 dataset's Hungarian config's train split. |
|
|
|
|
|
To finetune the model I used the [following code base](https://github.com/voicepowered-ai/VibeVoice-finetuning). |
|
|
|
|
|
Thank you for [JPGallegoar](https://github.com/jpgallegoar-vpai) for that amazing VibeVoice trainer! |
|
|
|
|
|
## Inference |
|
|
To use the LoRA model you can use [my modified fork](https://github.com/cseti007/VibeVoice) |
|
|
until the [following PR](https://github.com/vibevoice-community/VibeVoice/pull/6) |
|
|
will be merged into the main branch of [VibeVoice Community's repository](https://github.com/vibevoice-community/VibeVoice). |
|
|
|
|
|
## Examples |
|
|
|
|
|
**Voice without LoRA** |
|
|
<div style="display: flex; gap: 20px;"> |
|
|
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s42_nolora-1.wav"></audio> |
|
|
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s98765_nolora-1.wav"></audio> |
|
|
</div> |
|
|
|
|
|
|
|
|
**Voice WITH LoRA** |
|
|
<div style="display: flex; gap: 20px;"> |
|
|
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_hu-lora_srand3.wav"></audio> |
|
|
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s42_hu-lora-1.wav"></audio> |
|
|
</div> |