gpt-oss-20b-Derestricted-mxfp4-mlx
MLX MXFP4 quantization of ArliAI/gpt-oss-20b-Derestricted.
Model Capabilities
- Reasoning: Configurable effort (low/medium/high) via
reasoning_effortparameter - Tool Use: Native function calling support
- Context: 131k tokens
Quantization Details
Matches OpenAI's original MXFP4 quantization scheme:
| Component | Bits | Group Size | Format |
|---|---|---|---|
| MLP Experts | 4 | 32 | MXFP4 |
| Attention | - | - | Full precision (bfloat16) |
| Routers | - | - | Full precision (bfloat16) |
| Embeddings | - | - | Full precision (bfloat16) |
| LM Head | - | - | Full precision (bfloat16) |
Usage
mlx-lm
mlx_lm.chat --model txgsync/gpt-oss-20b-Derestricted-mxfp4-mlx
LM Studio
For full Reasoning Effort support in LM Studio, install via:
lms get txgsync/gpt-oss-20b-derestricted
This downloads the model with the virtual model wrapper that enables the Reasoning Effort selector.
Notes
- Requires mlx-lm with gpt_oss HF format support
- Quantization matches OpenAI's
modules_to_not_convertscheme for optimal quality
- Downloads last month
- 373