π¦ VieNeu-TTS v2 Turbo β GGUF
Ultra-fast Vietnamese & English TTS β runs entirely on CPU, no GPU required.
π Model Description
VieNeu-TTS v2 Turbo is the lightweight, CPU-optimized edition of the VieNeu-TTS family β a state-of-the-art Vietnamese Text-to-Speech system. Quantized to GGUF format and paired with an ONNX neural codec, this model delivers near-real-time speech synthesis on commodity hardware: laptops, edge devices, and even Raspberry Pi class machines.
This repository hosts the GGUF quantized weights intended for use with llama-cpp-python as the inference backend, alongside the companion ONNX codec for waveform generation.
What makes it special?
- π»π³πΊπΈ Bilingual (Code-switching): Naturally handles mixed VietnameseβEnglish sentences, powered by sea-g2p. No need to pre-label language boundaries.
- β‘ Extreme Speed: Optimized GGUF quantization achieves real-time or faster inference on a standard CPU.
- π» Zero GPU Dependency: Runs fully offline on any x86_64 / ARM64 machine with sufficient RAM.
- π AI Watermarking: Audio output embeds an imperceptible identifier for responsible AI content tracing.
- π 24 kHz Audio: High-fidelity waveform output suitable for production applications.
ποΈ Repository Contents
| File | Description |
|---|---|
vieneu-v2-turbo-*.gguf |
GGUF quantized LLM backbone (multiple quant levels) |
π Quickstart
Option 1 β Install via vieneu SDK (Recommended)
# Minimal installation (Turbo/CPU Only)
pip install vieneu
# Optional: Pre-built llama-cpp-python for CPU (if building fails)
pip install vieneu --extra-index-url https://pnnbao97.github.io/llama-cpp-python-v0.3.16/cpu/
# Optional: macOS Metal acceleration
pip install vieneu --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal/
from vieneu import Vieneu
# Initialize in Turbo mode (Default - Minimal dependencies)
tts = Vieneu()
# 1. Simple synthesis (uses default Southern Male voice 'XuΓ’n VΔ©nh')
text = "Hα» thα»ng Δiα»n chα»§ yαΊΏu sα» dα»₯ng alternating current because it is more efficient."
audio = tts.infer(text=text)
# Save to file
tts.save(audio, "output_XuΓ’n VΔ©nh.wav")
print("πΎ Saved to output_XuΓ’n VΔ©nh.wav")
# 2. Using a specific Preset Voice
voices = tts.list_preset_voices()
for desc, voice_id in voices:
print(f"Voice: {desc} (ID: {voice_id})")
my_voice_id = voices[1][1] if len(voices) > 1 else voices[0][1] # Giα»ng PhαΊ‘m TuyΓͺn
voice_data = tts.get_preset_voice(my_voice_id)
audio_custom = tts.infer(text="TΓ΄i Δang nΓ³i bαΊ±ng giα»ng cα»§a BΓ‘c sΔ© TuyΓͺn.", voice=voice_data)
# 3. Save to file
tts.save(audio_custom, "output_PhαΊ‘m TuyΓͺn.wav")
print("πΎ Saved to output_PhαΊ‘m TuyΓͺn.wav")
π¦ Zero-shot Voice Cloning (SDK)
Clone any voice with only 3-5 seconds of audio using the local Turbo engine:
from vieneu import Vieneu
tts = Vieneu() # Defaults to Turbo mode
# 1. Encode the reference audio (extracts speaker embedding)
# Supported formats: .wav, .mp3, .flac
my_voice = tts.encode_reference("examples/audio_ref/example.wav")
# 2. Synthesize with the cloned voice
# No reference text required for Turbo v2!
audio = tts.infer(
text="ΔΓ’y lΓ giα»ng nΓ³i Δược clone trα»±c tiαΊΏp bαΊ±ng SDK cα»§a VieNeu-TTS.",
voice=my_voice
)
tts.save(audio, "cloned_voice.wav")
Option 2 β Web UI (Full repo)
git clone https://github.com/pnnbao97/VieNeu-TTS.git
cd VieNeu-TTS
uv sync # minimal install (Turbo/CPU)
uv run vieneu-web
# β Open http://127.0.0.1:7860
π¬ Model Architecture
VieNeu-TTS v2 Turbo is a two-stage TTS system:
- LLM Backbone (GGUF): A transformer language model conditioned on text tokens and speaker embeddings. It predicts discrete audio codec tokens autoregressively.
- Neural Codec (ONNX): A VQ-VAE-based neural codec (VieNeu-Codec) decodes the predicted token sequence into a 24 kHz waveform.
The bilingual capability is enabled by sea-g2p, which converts mixed-language graphemes to phonemes before the LLM backbone processes them.
π Training Data
The model was trained on over 20,000 hours of combined Vietnamese and English speech data, covering a wide range of speakers, accents, recording conditions, and speaking styles.
| Dataset | Language | Description |
|---|---|---|
pnnbao-ump/VieNeu-TTS-1000h |
Vietnamese | Curated studio-quality Vietnamese speech corpus |
pnnbao-ump/vietnamese-audio-corpus |
Vietnamese | Diverse multi-speaker Vietnamese audio |
amphion/Emilia-Dataset |
Multilingual | Large-scale multilingual speech dataset |
facebook/multilingual_librispeech |
English + others | Multilingual read speech |
πΊοΈ Roadmap
- GGUF/ONNX Turbo engine
- Bilingual (VietnameseβEnglish) code-switching
- Turbo Voice Cloning
- Mobile SDK (Android / iOS)
- Streaming output API
π€ Related Resources
| Resource | Link |
|---|---|
| π¦ PyPI Package | pip install vieneu |
| π GitHub | pnnbao97/VieNeu-TTS |
| π Documentation | docs.vieneu.io |
| π€ Full Model (GPU) | pnnbao-ump/VieNeu-TTS |
| π¬ Discord Community | Join here |
| β Support the project | buymeacoffee.com/pnnbao |
π License
This model is released under the Apache License 2.0 β free for personal and commercial use.
Made with β€οΈ for the Vietnamese TTS community by @pnnbao97 and contributors.
- Downloads last month
- 6,033
We're not able to determine the quantization variants.