herimor
/

voxtream

Model card Files Files and versions

voxtream / README.md

herimor's picture

Update README

fa815a1 verified 2 months ago

|

1.93 kB

	---
	license: cc-by-4.0
	language:
	- en
	pipeline_tag: text-to-speech
	tags:
	- voxtream
	- text-to-speech
	---

	# Model Card for VoXtream

	VoXtream, a fully autoregressive, zero-shot streaming text-to-speech system for real-time use that begins speaking from the first word.

	### Key featues

	- Streaming: Support a full-stream scenario, where the full sentence is not known in advance. The model takes the text stream coming word-by-word as input and outputs an audio stream in 80ms chunks.
	- Speed: Works 5x times faster than real-time and achieves 102 ms first packet latency on GPU.
	- Quality and efficiency: With only 9k hours of training data, it matches or surpasses the quality and intelligibility of larger models or models trained on large datasets.

	### Model Sources

	- Repository: [repo](https://github.com/herimor/voxtream)
	- Paper: [paper](https://herimor.github.io/voxtream)
	- Demo: [demo](https://herimor.github.io/voxtream)

	## Get started

	Clone our [repo](https://github.com/herimor/voxtream) and follow instructions in README file.

	### Out-of-Scope Use

	Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

	## Training Data

	The model was trained on a 9k-hour subset from [Emilia](https://huggingface.co/datasets/amphion/Emilia-Dataset) and [HiFiTTS2](https://huggingface.co/datasets/nvidia/hifitts-2) datasets. For more details please check our paper.

	## Citation

	```
	@article{torgashov2025voxtream,
	author = {Torgashov, Nikita and Henter, Gustav Eje and Skantze, Gabriel},
	title = {Vo{X}tream: Full-Stream Text-to-Speech with Extremely Low Latency},
	journal = {arXiv},
	year = {2025}
	}
	```