Voxtral-4B-TTS-2603 (MLX 4bit)

MLX 4bit version of mistralai/Voxtral-4B-TTS-2603 โ€” a 4B parameter multilingual text-to-speech model with 20 voice presets across 9 languages.

Size: ~2.5GB

Use with mlx-audio

pip install -U mlx-audio
from mlx_audio.tts.utils import load

model = load("mlx-community/Voxtral-4B-TTS-2603-mlx-4bit")

for result in model.generate(
    text="Hello, this is a test of Voxtral text-to-speech!",
    voice="casual_male",
):
    # result.audio is an mx.array of 24kHz audio samples
    print(f"Generated {result.audio_duration} of audio")

Available Voices

English: casual_male, casual_female, cheerful_female, neutral_male, neutral_female

French: fr_male, fr_female | Spanish: es_male, es_female | German: de_male, de_female

Italian: it_male, it_female | Portuguese: pt_male, pt_female | Dutch: nl_male, nl_female

Arabic: ar_male | Hindi: hi_male, hi_female

Throughput (Apple Silicon)

Variant Short RTF Long RTF Size
4-bit 0.97x 0.74x ~2.5GB
6-bit 1.15x 1.07x ~3.5GB
bf16 6.50x 6.32x ~8GB

RTF = Real-Time Factor (lower is faster, <1.0 = faster than real-time).

Downloads last month
2,409
Safetensors
Model size
0.8B params
Tensor type
BF16
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mlx-community/Voxtral-4B-TTS-2603-mlx-4bit

Quantized
(4)
this model

Space using mlx-community/Voxtral-4B-TTS-2603-mlx-4bit 1