Want to build together? Realistic Conversational Model
We're from Clevr Labs. We build Conversational Voice Models
We've built many projects, and theclevr.com is one of our most recent AI voice projects that gave us even more insight into a really challenging problem the speech industry faces. It's a challenge to capture timbre, pitch, and prosody, especially across different languages.
Conversational Voice Model
So we are building a high-quality, speech model specifically for human like conversation. Not only does it convert text to speech but feels really natural and mimics high-fidelity speech.
Here is a sample of the model - https://youtu.be/GHx3N0o5cgQ
The Role
We're looking for anyone with the following experience and characteristics:
- Some experience building models and a good technical understanding of different architectures like transformers, diffusion models etc
 - Tech stack: Hugging Face, WandB, PyTorch, Python, Pandas, NumPy, Git (everything else is a plus—the more the merrier)
 - Quick learners: the job demands quite a lot of research since the field is always rapidly changing
 
You'll work with us on developing and improving an autoregressive speech model designed to produce natural and expressive speech.
The Work Involves
Preparing and preprocessing datasets, aligning text with audio, adjusting tokenizers, and optimizing training loops while evaluating model performance. You'll learn how architecture choices influence aspects like prosody, timbre, and pacing, and gain hands-on experience.
The role combines technical research with practical engineering, offering a space to learn, iterate, and contribute directly to shaping the next generation of speech synthesis.
If you'd like to work on the future of conversational speech, reach out to us at
Nice to meet you, you have a great project.))