🦜 VieNeu-TTS v2 Turbo β€” GGUF

Ultra-fast Vietnamese & English TTS β€” runs entirely on CPU, no GPU required.

Apache 2.0 VieNeu GitHub Discord


πŸ“– Model Description

VieNeu-TTS v2 Turbo is the lightweight, CPU-optimized edition of the VieNeu-TTS family β€” a state-of-the-art Vietnamese Text-to-Speech system. Quantized to GGUF format and paired with an ONNX neural codec, this model delivers near-real-time speech synthesis on commodity hardware: laptops, edge devices, and even Raspberry Pi class machines.

This repository hosts the GGUF quantized weights intended for use with llama-cpp-python as the inference backend, alongside the companion ONNX codec for waveform generation.

What makes it special?

  • πŸ‡»πŸ‡³πŸ‡ΊπŸ‡Έ Bilingual (Code-switching): Naturally handles mixed Vietnamese–English sentences, powered by sea-g2p. No need to pre-label language boundaries.
  • ⚑ Extreme Speed: Optimized GGUF quantization achieves real-time or faster inference on a standard CPU.
  • πŸ’» Zero GPU Dependency: Runs fully offline on any x86_64 / ARM64 machine with sufficient RAM.
  • πŸ”‡ AI Watermarking: Audio output embeds an imperceptible identifier for responsible AI content tracing.
  • πŸ”Š 24 kHz Audio: High-fidelity waveform output suitable for production applications.

πŸ—‚οΈ Repository Contents

File Description
vieneu-v2-turbo-*.gguf GGUF quantized LLM backbone (multiple quant levels)

πŸš€ Quickstart

Option 1 β€” Install via vieneu SDK (Recommended)

# Minimal installation (Turbo/CPU Only)
pip install vieneu

# Optional: Pre-built llama-cpp-python for CPU (if building fails)
pip install vieneu --extra-index-url https://pnnbao97.github.io/llama-cpp-python-v0.3.16/cpu/

# Optional: macOS Metal acceleration
pip install vieneu --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal/
from vieneu import Vieneu

# Initialize in Turbo mode (Default - Minimal dependencies)
tts = Vieneu()

# 1. Simple synthesis (uses default Southern Male voice 'XuΓ’n VΔ©nh')
text = "Hệ thα»‘ng Δ‘iện chα»§ yαΊΏu sα»­ dα»₯ng alternating current because it is more efficient."
audio = tts.infer(text=text)

# Save to file
tts.save(audio, "output_XuΓ’n VΔ©nh.wav")
print("πŸ’Ύ Saved to output_XuΓ’n VΔ©nh.wav")

# 2. Using a specific Preset Voice
voices = tts.list_preset_voices()
for desc, voice_id in voices:
    print(f"Voice: {desc} (ID: {voice_id})")

my_voice_id = voices[1][1] if len(voices) > 1 else voices[0][1] # Giọng PhαΊ‘m TuyΓͺn
voice_data = tts.get_preset_voice(my_voice_id)

audio_custom = tts.infer(text="TΓ΄i Δ‘ang nΓ³i bαΊ±ng giọng cα»§a BΓ‘c sΔ© TuyΓͺn.", voice=voice_data)

# 3. Save to file
tts.save(audio_custom, "output_PhαΊ‘m TuyΓͺn.wav")
print("πŸ’Ύ Saved to output_PhαΊ‘m TuyΓͺn.wav")

🦜 Zero-shot Voice Cloning (SDK)

Clone any voice with only 3-5 seconds of audio using the local Turbo engine:

from vieneu import Vieneu

tts = Vieneu() # Defaults to Turbo mode

# 1. Encode the reference audio (extracts speaker embedding)
# Supported formats: .wav, .mp3, .flac
my_voice = tts.encode_reference("examples/audio_ref/example.wav")

# 2. Synthesize with the cloned voice
# No reference text required for Turbo v2!
audio = tts.infer(
    text="ĐÒy lΓ  giọng nΓ³i được clone trα»±c tiαΊΏp bαΊ±ng SDK cα»§a VieNeu-TTS.", 
    voice=my_voice
)

tts.save(audio, "cloned_voice.wav")

Option 2 β€” Web UI (Full repo)

git clone https://github.com/pnnbao97/VieNeu-TTS.git
cd VieNeu-TTS
uv sync          # minimal install (Turbo/CPU)
uv run vieneu-web
# β†’ Open http://127.0.0.1:7860

πŸ”¬ Model Architecture

VieNeu-TTS v2 Turbo is a two-stage TTS system:

  1. LLM Backbone (GGUF): A transformer language model conditioned on text tokens and speaker embeddings. It predicts discrete audio codec tokens autoregressively.
  2. Neural Codec (ONNX): A VQ-VAE-based neural codec (VieNeu-Codec) decodes the predicted token sequence into a 24 kHz waveform.

The bilingual capability is enabled by sea-g2p, which converts mixed-language graphemes to phonemes before the LLM backbone processes them.


πŸ“Š Training Data

The model was trained on over 20,000 hours of combined Vietnamese and English speech data, covering a wide range of speakers, accents, recording conditions, and speaking styles.

Dataset Language Description
pnnbao-ump/VieNeu-TTS-1000h Vietnamese Curated studio-quality Vietnamese speech corpus
pnnbao-ump/vietnamese-audio-corpus Vietnamese Diverse multi-speaker Vietnamese audio
amphion/Emilia-Dataset Multilingual Large-scale multilingual speech dataset
facebook/multilingual_librispeech English + others Multilingual read speech

πŸ—ΊοΈ Roadmap

  • GGUF/ONNX Turbo engine
  • Bilingual (Vietnamese–English) code-switching
  • Turbo Voice Cloning
  • Mobile SDK (Android / iOS)
  • Streaming output API

🀝 Related Resources

Resource Link
πŸ“¦ PyPI Package pip install vieneu
πŸ™ GitHub pnnbao97/VieNeu-TTS
πŸ“– Documentation docs.vieneu.io
πŸ€— Full Model (GPU) pnnbao-ump/VieNeu-TTS
πŸ’¬ Discord Community Join here
β˜• Support the project buymeacoffee.com/pnnbao

πŸ“„ License

This model is released under the Apache License 2.0 β€” free for personal and commercial use.


Made with ❀️ for the Vietnamese TTS community by @pnnbao97 and contributors.

Downloads last month
6,033
GGUF
Model size
0.1B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train pnnbao-ump/VieNeu-TTS-v2-Turbo-GGUF

Spaces using pnnbao-ump/VieNeu-TTS-v2-Turbo-GGUF 2

Collection including pnnbao-ump/VieNeu-TTS-v2-Turbo-GGUF