Text-to-Speech
Transformers
Safetensors
lfm2
text-generation

Want to build together? Realistic Conversational Model

#6
by Cyrus21 - opened

We're from Clevr Labs. We build Conversational Voice Models

We've built many projects, and theclevr.com is one of our most recent AI voice projects that gave us even more insight into a really challenging problem the speech industry faces. It's a challenge to capture timbre, pitch, and prosody, especially across different languages.

Conversational Voice Model

So we are building a high-quality, speech model specifically for human like conversation. Not only does it convert text to speech but feels really natural and mimics high-fidelity speech.

Here is a sample of the model - https://youtu.be/GHx3N0o5cgQ

The Role

We're looking for anyone with the following experience and characteristics:

  • Some experience building models and a good technical understanding of different architectures like transformers, diffusion models etc
  • Tech stack: Hugging Face, WandB, PyTorch, Python, Pandas, NumPy, Git (everything else is a plus—the more the merrier)
  • Quick learners: the job demands quite a lot of research since the field is always rapidly changing

You'll work with us on developing and improving an autoregressive speech model designed to produce natural and expressive speech.

The Work Involves

Preparing and preprocessing datasets, aligning text with audio, adjusting tokenizers, and optimizing training loops while evaluating model performance. You'll learn how architecture choices influence aspects like prosody, timbre, and pacing, and gain hands-on experience.

The role combines technical research with practical engineering, offering a space to learn, iterate, and contribute directly to shaping the next generation of speech synthesis.

If you'd like to work on the future of conversational speech, reach out to us at

[email protected]

NineNineSix org

Nice to meet you, you have a great project.))

Sign up or log in to comment