--- license: mit language: - en - multilingual library_name: transformers tags: - text-to-speech - audio - tts - voice - quantized - 8bit - bitsandbytes - vibevoice pipeline_tag: text-to-audio model-index: - name: VibeVoice-Large-Q8 results: [] --- # VibeVoice-Large-Q8 - Selective 8bit Quantization
**The first 8-bit VibeVoice model that actually works** [![License](https://img.shields.io/badge/license-MIT-blue)](LICENSE) [![Model Size](https://img.shields.io/badge/size-11.6%20GB-green)](https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8) [![Quality](https://img.shields.io/badge/audio-identical%20quality-brightgreen)](https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8) [🤗 Model](https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8) • [💻 ComfyUI](https://github.com/Enemyx-net/VibeVoice-ComfyUI) • [📖 Docs](https://github.com/Enemyx-net/VibeVoice-ComfyUI/blob/main/README.md)
--- ## 🎯 Why This Model is Different If you've tried other 8-bit quantized VibeVoice models, you probably got nothing but static noise. **This one actually works.** The secret? **Selective quantization**: I only quantized the language model (the most robust part), while keeping audio-critical components (diffusion head, VAE, connectors) at full precision. ### Results - ✅ Perfect audio, identical to the original model - ✅ 11.6 GB instead of 18.7 GB (-38%) - ✅ Uses ~12 GB VRAM instead of 20 GB - ✅ Works on 12 GB GPUs (RTX 3060, 4070 Ti, etc.) --- ## 🚨 The Problem with Other 8-bit Models Most 8-bit models you'll find online quantize **everything** aggressively: **Result:** Audio components get quantized → numerical errors propagate → audio = pure noise. --- ## ✅ The Solution: Selective Quantization I only quantized what can be safely quantized without losing quality. **Result:** 52% of parameters quantized, 48% at full precision = perfect audio quality. --- ## 📊 Quick Comparison | Model | Size | Audio Quality | Status | |-------|------|---------------|--------| | Original VibeVoice | 18.7 GB | ⭐⭐⭐⭐⭐ | Full precision | | Other 8-bit models | 10.6 GB | 💥 NOISE | ❌ Don't work | | **This model** | **11.6 GB** | ⭐⭐⭐⭐⭐ | ✅ **Perfect** | +1.0 GB vs other 8-bit models = perfect audio instead of noise. Worth it. --- ## 💻 How to Use It ### With Transformers ```python from transformers import AutoModelForCausalLM, AutoProcessor import torch import scipy.io.wavfile as wavfile # Load model model = AutoModelForCausalLM.from_pretrained( "FabioSarracino/VibeVoice-Large-Q8", device_map="auto", trust_remote_code=True, torch_dtype=torch.bfloat16, ) processor = AutoProcessor.from_pretrained( "FabioSarracino/VibeVoice-Large-Q8", trust_remote_code=True ) # Generate audio text = "Hello, this is VibeVoice speaking." inputs = processor(text, return_tensors="pt").to(model.device) output = model.generate(**inputs, max_new_tokens=None) # Save audio = output.speech_outputs[0].cpu().numpy() wavfile.write("output.wav", 24000, audio) ``` ### With ComfyUI (recommended) 1. Install the custom node: ```bash cd ComfyUI/custom_nodes git clone https://github.com/Enemyx-net/VibeVoice-ComfyUI ``` 2. Download this model to `ComfyUI/models/vibevoice/` 3. Restart ComfyUI and use it normally! --- ## 💾 System Requirements ### Minimum - **VRAM:** 12 GB - **RAM:** 16 GB - **GPU:** NVIDIA with CUDA (required) - **Storage:** 11 GB ### Recommended - **VRAM:** 16+ GB - **RAM:** 32 GB - **GPU:** RTX 3090/4090, A5000 or better ⚠️ **Not supported:** CPU, Apple Silicon (MPS), AMD GPUs --- ## ⚠️ Limitations 1. **Requires NVIDIA GPU with CUDA** - won't work on CPU or Apple Silicon 2. **Inference only** - don't use for fine-tuning 3. **Requires:** - `transformers>=4.51.3` - `bitsandbytes>=0.43.0` --- ## 🆚 When to Use This Model ### ✅ Use this 8-bit if: - You have 12-16 GB VRAM - You want maximum quality with reduced size - You need a production-ready model - You want the best size/quality balance ### Use full precision (18.7 GB) if: - You have unlimited VRAM (24+ GB) - You're doing research requiring absolute precision ### Use 4-bit NF4 (~6.6 GB) if: - You only have 8-10 GB VRAM - You can accept a small quality trade-off --- ## 🔧 Troubleshooting ### "OutOfMemoryError" during loading - Close other GPU applications - Use `device_map="auto"` - Reduce batch size to 1 ### "BitsAndBytes not found" ```bash pip install bitsandbytes>=0.43.0 ``` ### Audio sounds distorted This shouldn't happen! If it does: 1. Verify you downloaded the correct model 2. Update transformers: `pip install --upgrade transformers` 3. Check CUDA: `torch.cuda.is_available()` should return `True` --- ## 📚 Citation ```bibtex @misc{vibevoice-q8-2025, title={VibeVoice-Large-Q8: Selective 8-bit Quantization for Audio Quality}, author={Fabio Sarracino}, year={2025}, url={https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8} } ``` ### Original Model ```bibtex @misc{vibevoice2024, title={VibeVoice: High-Quality Text-to-Speech with Large Language Models}, author={Microsoft Research}, year={2024}, url={https://github.com/microsoft/VibeVoice} } ``` --- ## 🔗 Related Resources - [Original Model](https://huggingface.co/aoi-ot/VibeVoice-Large) - Full precision base - [ComfyUI Node](https://github.com/Enemyx-net/VibeVoice-ComfyUI) - ComfyUI integration --- ## 📜 License MIT License. --- ## 🤝 Support - **Issues:** [GitHub Issues](https://github.com/Enemyx-net/VibeVoice-ComfyUI/issues) - **Questions:** [HuggingFace Discussions](https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8/discussions) If this model helped you, leave a ⭐ on GitHub! ---
**Created by [Fabio Sarracino](https://github.com/Enemyx-net)** *The first 8-bit VibeVoice model that actually works* [🤗 HuggingFace](https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8) • [💻 GitHub](https://github.com/Enemyx-net/VibeVoice-ComfyUI)