Qwen3-VL-235B-A22B-Instruct โ€” MLX mxfp8

MLX-format conversion of Qwen/Qwen3-VL-235B-A22B-Instruct (BF16 full precision) for Apple Silicon inference.

Quantization

Parameter Value
Format MLX safetensors
Quantization mxfp8
Bits per weight 8.269
Group size 32
Shards 48 ร— ~5.1 GB
Total size ~243 GB

Usage

pip install mlx-vlm

# Text generation
python -m mlx_vlm generate \
    --model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp8 \
    --prompt "What model are you?" \
    --max-tokens 128

# Vision
python -m mlx_vlm generate \
    --model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp8 \
    --image photo.jpg \
    --prompt "Describe this image in detail." \
    --max-tokens 256

Hardware Requirements

  • Apple Silicon with โ‰ฅ256 GB unified memory (tested on M3 Ultra 512 GB)
  • macOS 15+, MLX 0.30.4+

Model Details

  • Architecture: Qwen3-VL (Vision-Language Model) with Mixture of Experts (128 experts, top-k routing)
  • Parameters: 235B total, ~22B active per token
  • Capabilities: Text, image, and video understanding
  • Source: Converted from BF16 full precision checkpoint using patched mlx-vlm with per-tensor materialization to avoid Metal GPU timeout on large models

Conversion

Converted with mlx-vlm (patched for 235B+ model support):

python -m mlx_vlm convert \
    --hf-path Qwen/Qwen3-VL-235B-A22B-Instruct \
    -q --q-bits 8 --q-mode mxfp8 --q-group-size 32 \
    --mlx-path Qwen3-VL-235B-A22B-Instruct-mlx-mxfp8

Patches required for models >100B: per-tensor lazy weight materialization before quantization to prevent Metal command buffer timeout. See LibraxisAI/mlx-vlm for the fixes.


Created by M&K (c)2026 The LibraxisAI Team

Downloads last month
65
Safetensors
Model size
67B params
Tensor type
U8
ยท
U32
ยท
BF16
ยท
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp8

Quantized
(25)
this model