Qwen3-VL-235B-A22B-Instruct โ MLX mxfp8
MLX-format conversion of Qwen/Qwen3-VL-235B-A22B-Instruct (BF16 full precision) for Apple Silicon inference.
Quantization
| Parameter | Value |
|---|---|
| Format | MLX safetensors |
| Quantization | mxfp8 |
| Bits per weight | 8.269 |
| Group size | 32 |
| Shards | 48 ร ~5.1 GB |
| Total size | ~243 GB |
Usage
pip install mlx-vlm
# Text generation
python -m mlx_vlm generate \
--model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp8 \
--prompt "What model are you?" \
--max-tokens 128
# Vision
python -m mlx_vlm generate \
--model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp8 \
--image photo.jpg \
--prompt "Describe this image in detail." \
--max-tokens 256
Hardware Requirements
- Apple Silicon with โฅ256 GB unified memory (tested on M3 Ultra 512 GB)
- macOS 15+, MLX 0.30.4+
Model Details
- Architecture: Qwen3-VL (Vision-Language Model) with Mixture of Experts (128 experts, top-k routing)
- Parameters: 235B total, ~22B active per token
- Capabilities: Text, image, and video understanding
- Source: Converted from BF16 full precision checkpoint using patched mlx-vlm with per-tensor materialization to avoid Metal GPU timeout on large models
Conversion
Converted with mlx-vlm (patched for 235B+ model support):
python -m mlx_vlm convert \
--hf-path Qwen/Qwen3-VL-235B-A22B-Instruct \
-q --q-bits 8 --q-mode mxfp8 --q-group-size 32 \
--mlx-path Qwen3-VL-235B-A22B-Instruct-mlx-mxfp8
Patches required for models >100B: per-tensor lazy weight materialization before quantization to prevent Metal command buffer timeout. See LibraxisAI/mlx-vlm for the fixes.
Created by M&K (c)2026 The LibraxisAI Team
- Downloads last month
- 65
Model size
67B params
Tensor type
U8
ยท
U32 ยท
BF16 ยท
Hardware compatibility
Log In to add your hardware
8-bit
Model tree for LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp8
Base model
Qwen/Qwen3-VL-235B-A22B-Instruct