File size: 1,681 Bytes
ae0e6ac
 
10ae90f
ae0e6ac
10ae90f
 
ae0e6ac
10ae90f
 
 
 
 
ae0e6ac
d17f1a3
 
 
 
 
 
05871e2
d17f1a3
 
 
 
bec87fb
 
 
 
 
80a2efc
 
 
 
 
bec87fb
 
80a2efc
bec87fb
80a2efc
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
base_model:
- aoi-ot/VibeVoice-Large
tags:
- text-to-speech
- tts
- lora
- vibevice
datasets:
- mozilla-foundation/common_voice_17_0
language:
- hu
---
# VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17
This is a VibeVoice 7B (Large) model LoRA finetune on a Hungarian audio dataset. 
For this particular test I used the CommonVoice 17.0 dataset's Hungarian config's train split.

To finetune the model I used the [following code base](https://github.com/voicepowered-ai/VibeVoice-finetuning). 

Thank you for [JPGallegoar](https://github.com/jpgallegoar-vpai) for that amazing VibeVoice trainer!

## Inference
To use the LoRA model you can use [my modified fork](https://github.com/cseti007/VibeVoice) 
until the [following PR](https://github.com/vibevoice-community/VibeVoice/pull/6) 
will be merged into the main branch of [VibeVoice Community's repository](https://github.com/vibevoice-community/VibeVoice).

## Examples

**Voice without LoRA**
<div style="display: flex; gap: 20px;">
  <audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s42_nolora-1.wav"></audio>
  <audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s98765_nolora-1.wav"></audio>
</div>


**Voice WITH LoRA**
<div style="display: flex; gap: 20px;">
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_hu-lora_srand3.wav"></audio>
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s42_hu-lora-1.wav"></audio>
</div>