🦜 VieNeu-TTS v2 Turbo — GGUF

Ultra-fast Vietnamese & English TTS — runs entirely on CPU, no GPU required.

📖 Model Description

VieNeu-TTS v2 Turbo is the lightweight, CPU-optimized edition of the VieNeu-TTS family — a state-of-the-art Vietnamese Text-to-Speech system. Quantized to GGUF format and paired with an ONNX neural codec, this model delivers near-real-time speech synthesis on commodity hardware: laptops, edge devices, and even Raspberry Pi class machines.

This repository hosts the GGUF quantized weights intended for use with llama-cpp-python as the inference backend, alongside the companion ONNX codec for waveform generation.

What makes it special?

🇻🇳🇺🇸 Bilingual (Code-switching): Naturally handles mixed Vietnamese–English sentences, powered by sea-g2p. No need to pre-label language boundaries.
⚡ Extreme Speed: Optimized GGUF quantization achieves real-time or faster inference on a standard CPU.
💻 Zero GPU Dependency: Runs fully offline on any x86_64 / ARM64 machine with sufficient RAM.
🔇 AI Watermarking: Audio output embeds an imperceptible identifier for responsible AI content tracing.
🔊 24 kHz Audio: High-fidelity waveform output suitable for production applications.

🗂️ Repository Contents

File	Description
`vieneu-v2-turbo-*.gguf`	GGUF quantized LLM backbone (multiple quant levels)

🚀 Quickstart

Option 1 — Install via `vieneu` SDK (Recommended)

# Minimal installation (Turbo/CPU Only)
pip install vieneu

# Optional: Pre-built llama-cpp-python for CPU (if building fails)
pip install vieneu --extra-index-url https://pnnbao97.github.io/llama-cpp-python-v0.3.16/cpu/

# Optional: macOS Metal acceleration
pip install vieneu --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal/

from vieneu import Vieneu

# Initialize in Turbo mode (Default - Minimal dependencies)
tts = Vieneu()

# 1. Simple synthesis (uses default Southern Male voice 'Xuân Vĩnh')
text = "Hệ thống điện chủ yếu sử dụng alternating current because it is more efficient."
audio = tts.infer(text=text)

# Save to file
tts.save(audio, "output_Xuân Vĩnh.wav")
print("💾 Saved to output_Xuân Vĩnh.wav")

# 2. Using a specific Preset Voice
voices = tts.list_preset_voices()
for desc, voice_id in voices:
    print(f"Voice: {desc} (ID: {voice_id})")

my_voice_id = voices[1][1] if len(voices) > 1 else voices[0][1] # Giọng Phạm Tuyên
voice_data = tts.get_preset_voice(my_voice_id)

audio_custom = tts.infer(text="Tôi đang nói bằng giọng của Bác sĩ Tuyên.", voice=voice_data)

# 3. Save to file
tts.save(audio_custom, "output_Phạm Tuyên.wav")
print("💾 Saved to output_Phạm Tuyên.wav")

🦜 Zero-shot Voice Cloning (SDK)

Clone any voice with only 3-5 seconds of audio using the local Turbo engine:

from vieneu import Vieneu

tts = Vieneu() # Defaults to Turbo mode

# 1. Encode the reference audio (extracts speaker embedding)
# Supported formats: .wav, .mp3, .flac
my_voice = tts.encode_reference("examples/audio_ref/example.wav")

# 2. Synthesize with the cloned voice
# No reference text required for Turbo v2!
audio = tts.infer(
    text="Đây là giọng nói được clone trực tiếp bằng SDK của VieNeu-TTS.", 
    voice=my_voice
)

tts.save(audio, "cloned_voice.wav")

Option 2 — Web UI (Full repo)

git clone https://github.com/pnnbao97/VieNeu-TTS.git
cd VieNeu-TTS
uv sync          # minimal install (Turbo/CPU)
uv run vieneu-web
# → Open http://127.0.0.1:7860

🔬 Model Architecture

VieNeu-TTS v2 Turbo is a two-stage TTS system:

LLM Backbone (GGUF): A transformer language model conditioned on text tokens and speaker embeddings. It predicts discrete audio codec tokens autoregressively.
Neural Codec (ONNX): A VQ-VAE-based neural codec (VieNeu-Codec) decodes the predicted token sequence into a 24 kHz waveform.

The bilingual capability is enabled by sea-g2p, which converts mixed-language graphemes to phonemes before the LLM backbone processes them.

📊 Training Data

The model was trained on over 20,000 hours of combined Vietnamese and English speech data, covering a wide range of speakers, accents, recording conditions, and speaking styles.

Dataset	Language	Description
`pnnbao-ump/VieNeu-TTS-1000h`	Vietnamese	Curated studio-quality Vietnamese speech corpus
`pnnbao-ump/vietnamese-audio-corpus`	Vietnamese	Diverse multi-speaker Vietnamese audio
`amphion/Emilia-Dataset`	Multilingual	Large-scale multilingual speech dataset
`facebook/multilingual_librispeech`	English + others	Multilingual read speech

🗺️ Roadmap

GGUF/ONNX Turbo engine
Bilingual (Vietnamese–English) code-switching
Turbo Voice Cloning
Mobile SDK (Android / iOS)
Streaming output API

🤝 Related Resources

Resource	Link
📦 PyPI Package	`pip install vieneu`
🐙 GitHub	pnnbao97/VieNeu-TTS
📖 Documentation	docs.vieneu.io
🤗 Full Model (GPU)	pnnbao-ump/VieNeu-TTS
💬 Discord Community	Join here
☕ Support the project	buymeacoffee.com/pnnbao

📄 License

This model is released under the Apache License 2.0 — free for personal and commercial use.

Made with ❤️ for the Vietnamese TTS community by @pnnbao97 and contributors.

Downloads last month: 6,033

GGUF

Model size

0.1B params

Architecture

qwen3

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Datasets used to train pnnbao-ump/VieNeu-TTS-v2-Turbo-GGUF

Spaces using pnnbao-ump/VieNeu-TTS-v2-Turbo-GGUF 2

Collection including pnnbao-ump/VieNeu-TTS-v2-Turbo-GGUF

VieNeu-TTS-v2

Collection

VieNeu-TTS-v2 is an advanced on-device Vietnamese Text-to-Speech (TTS) model with instant voice cloning and English-Vietnamese bilingual support. • 4 items • Updated 7 days ago • 1