gpt-oss-20b-Derestricted-mxfp4-mlx

MLX MXFP4 quantization of ArliAI/gpt-oss-20b-Derestricted.

Model Capabilities

  • Reasoning: Configurable effort (low/medium/high) via reasoning_effort parameter
  • Tool Use: Native function calling support
  • Context: 131k tokens

Quantization Details

Matches OpenAI's original MXFP4 quantization scheme:

Component Bits Group Size Format
MLP Experts 4 32 MXFP4
Attention - - Full precision (bfloat16)
Routers - - Full precision (bfloat16)
Embeddings - - Full precision (bfloat16)
LM Head - - Full precision (bfloat16)

Usage

mlx-lm

mlx_lm.chat --model txgsync/gpt-oss-20b-Derestricted-mxfp4-mlx

LM Studio

For full Reasoning Effort support in LM Studio, install via:

lms get txgsync/gpt-oss-20b-derestricted

This downloads the model with the virtual model wrapper that enables the Reasoning Effort selector.

Notes

  • Requires mlx-lm with gpt_oss HF format support
  • Quantization matches OpenAI's modules_to_not_convert scheme for optimal quality
Downloads last month
373
Safetensors
Model size
21B params
Tensor type
BF16
·
U8
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for txgsync/gpt-oss-20b-Derestricted-mxfp4-mlx

Base model

openai/gpt-oss-20b
Quantized
(8)
this model