--- title: xVASynth TTS emoji: 🧝‍♀️🧛‍♂️🧚‍♀️ colorFrom: gray colorTo: gray sdk: gradio python_version: 3.9 sdk_version: 4.20.0 models: - Pendrokar/xvapitch_nvidia - Pendrokar/xvapitch_expresso - Pendrokar/TorchMoji - Pendrokar/xvasynth_lojban - Pendrokar/xvasynth_cabal app_file: app.py app_port: 7860 tags: - tts - t2s - sts - s2s pinned: true preload_from_hub: - Pendrokar/xvapitch_nvidia - Pendrokar/xvapitch_expresso - Pendrokar/TorchMoji - Pendrokar/xvasynth_lojban - Pendrokar/xvasynth_cabal license: gpl-3.0 thumbnail: https://huggingface.co/spaces/Pendrokar/xVASynth/raw/main/thumbnail.png short_description: CPU powered, low RTF, emotional, multilingual TTS --- DanRuta's xVASynth, GitHub repo: [https://github.com/DanRuta/xVA-Synth](https://github.com/DanRuta/xVA-Synth) Papers: - VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech - https://arxiv.org/abs/2106.06103 - YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone - https://arxiv.org/abs/2112.02418 Referenced papers within code: - Multi-head attention with Relative Positional embedding - https://arxiv.org/pdf/1809.04281.pdf - Transformer with Relative Potional Encoding- https://arxiv.org/abs/1803.02155 - SDP - https://arxiv.org/pdf/2106.06103.pdf - Spline Flow - https://arxiv.org/abs/1906.04032 Extra: - DeepMoji - https://arxiv.org/abs/1708.00524 xVA FastPitch: - [1] [FastPitch: Parallel Text-to-speech with Pitch Prediction](https://arxiv.org/abs/2006.06873) - [2] [One TTS Alignment To Rule Them All](https://arxiv.org/abs/2108.10447) Used datasets: Unknown/Non-permissiable data