Qwen3-VL-235B-A22B-Instruct — MLX mxfp8

MLX-format conversion of Qwen/Qwen3-VL-235B-A22B-Instruct (BF16 full precision) for Apple Silicon inference.

Quantization

Parameter	Value
Format	MLX safetensors
Quantization	mxfp8
Bits per weight	8.269
Group size	32
Shards	48 × ~5.1 GB
Total size	~243 GB

Usage

pip install mlx-vlm

# Text generation
python -m mlx_vlm generate \
    --model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp8 \
    --prompt "What model are you?" \
    --max-tokens 128

# Vision
python -m mlx_vlm generate \
    --model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp8 \
    --image photo.jpg \
    --prompt "Describe this image in detail." \
    --max-tokens 256

Hardware Requirements

Apple Silicon with ≥256 GB unified memory (tested on M3 Ultra 512 GB)
macOS 15+, MLX 0.30.4+

Model Details

Architecture: Qwen3-VL (Vision-Language Model) with Mixture of Experts (128 experts, top-k routing)
Parameters: 235B total, ~22B active per token
Capabilities: Text, image, and video understanding
Source: Converted from BF16 full precision checkpoint using patched mlx-vlm with per-tensor materialization to avoid Metal GPU timeout on large models

Conversion

Converted with mlx-vlm (patched for 235B+ model support):

python -m mlx_vlm convert \
    --hf-path Qwen/Qwen3-VL-235B-A22B-Instruct \
    -q --q-bits 8 --q-mode mxfp8 --q-group-size 32 \
    --mlx-path Qwen3-VL-235B-A22B-Instruct-mlx-mxfp8

Patches required for models >100B: per-tensor lazy weight materialization before quantization to prevent Metal command buffer timeout. See LibraxisAI/mlx-vlm for the fixes.

Downloads last month: 65

Safetensors

Model size

67B params

Tensor type

U32

BF16

MLX

Hardware compatibility

8-bit

Model tree for LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp8

Base model

Qwen/Qwen3-VL-235B-A22B-Instruct

Quantized

(25)

this model