magiccodingman's picture
initial upload
f5a619b verified

Hybrid Naming Scheme & Benchmark Synopsis

This report summarizes baseline and hybrid quantization results for Qwen3-30B-A3B-Instruct-2507-unsloth as measured by the Magic Quant pipeline.

Naming Scheme

Model variants follow a structured suffix convention that encodes both the base conversion mode and per-tensor quantization schemes.

Suffix Example Meaning
BF16 Pure full-precision family baseline (no quantization).
Q8_0, Q6_K, Q5_K, Q4_K_M, IQ4_NL, MXFP4_MOE Pure model-wide quantization baselines.
iq4_nl-emb_Q4_K-head_Q4_K-moe_rt_Q4_K Base conversion mode iq4_nl with per-group schemes: embeddings (emb_), output head (head_), MoE router (moe_rt_).
...-aq_F16-akv_Q8_0-fd_Q4_K-ao_Q5_K Extended sensitivity groups: Attention Q (aq_), Attention K+V (akv_), FFN Down (fd_), Attention Output (ao_).
mxfp4_moe-emb_IQ4_NL-head_Q6_K-moe_exp_MXFP4-moe_rt_Q6_K MXFP4-centric hybrids with MoE expert group (moe_exp_) and mixed IQ / Q-schemes per tensor group.

In general, anything after the base model name is a purely mechanical description of how the weights were transformed, not a new training run.


Benchmark Methodology

All models were tested with a unified automated harness using llama.cpp tools.

Included tests:

  • Throughput:
    llama-bench with descending GPU offload (-ngl 35 → 0) and automatic OOM retry.
    Highest successful TPS is recorded.

  • Perplexity:
    Three domains: general, code, math.
    Each uses an auto-generated corpus of ~32k tokens.
    Perplexity is computed with llama-perplexity at 2048-token context.
    Same GPU retry logic as above.

  • Precision loss:
    Each model is compared to its family BF16 baseline.
    Precision-loss % is computed for all PPL domains, plus an averaged score.
    Models are ranked by this metric.


Table - Overview of Results

Comparing to BF16.

model_name size_reduction tps_change
iq4_nl-akv_Q8_0-ao_Q8_0-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 46.84% 124.10%
Q5_K 64.45% 163.87%
mxfp4_moe-akv_Q5_K-ao_Q5_K-aq_Q6_K-emb_IQ4_NL-fd_Q6_K-fug_IQ4_NL-head_BF16-moe_rt_IQ4_NL 66.73% 148.52%
mxfp4_moe-akv_Q5_K-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_Q8_0-moe_rt_Q8_0 63.29% 166.48%
IQ4_NL 71.42% 211.80%
iq4_nl-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL 71.81% 236.69%
  • All percentages compared against the selected family BF16 baseline.

Table - File Size + TPS + Avg Precision Loss

model_name file_size_gb bench_tps avg_prec_loss
BF16 56.90 44.48 0.0000%
iq4_nl-akv_Q8_0-ao_Q8_0-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 30.25 99.68 0.0771%
Q5_K 20.23 117.37 0.2007%
mxfp4_moe-akv_Q5_K-ao_Q5_K-aq_Q6_K-emb_IQ4_NL-fd_Q6_K-fug_IQ4_NL-head_BF16-moe_rt_IQ4_NL 18.93 110.54 0.3929%
mxfp4_moe-akv_Q5_K-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_Q8_0-moe_rt_Q8_0 20.89 118.53 0.3939%
IQ4_NL 16.26 138.69 0.4198%
iq4_nl-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL 16.04 149.76 2.6323%
  • avg_prec_loss is the averaged absolute precision-loss % vs BF16.

Table - PPL Columns

model_name gen gen_er code code_er math math_er
BF16 6.2581 0.1279 1.2981 0.0072 5.7092 0.1064
iq4_nl-akv_Q8_0-ao_Q8_0-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 6.2536 0.1277 1.2991 0.0072 5.7045 0.1063
Q5_K 6.2777 0.1283 1.3006 0.0073 5.7037 0.1062
mxfp4_moe-akv_Q5_K-ao_Q5_K-aq_Q6_K-emb_IQ4_NL-fd_Q6_K-fug_IQ4_NL-head_BF16-moe_rt_IQ4_NL 6.2854 0.1284 1.3036 0.0072 5.7274 0.1068
mxfp4_moe-akv_Q5_K-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_Q8_0-moe_rt_Q8_0 6.2759 0.1276 1.3042 0.0072 5.6848 0.1050
IQ4_NL 6.2669 0.1274 1.3111 0.0073 5.7159 0.1061
iq4_nl-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL 6.4836 0.1337 1.3170 0.0075 5.8712 0.1099
  • gen = ppl_general, code = ppl_code, math = ppl_math

Table - Precision Loss Columns

model_name loss_general loss_code loss_math
BF16 0.0000 0.0000 0.0000
iq4_nl-akv_Q8_0-ao_Q8_0-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 0.0719 0.0770 0.0823
Q5_K 0.3132 0.1926 0.0963
mxfp4_moe-akv_Q5_K-ao_Q5_K-aq_Q6_K-emb_IQ4_NL-fd_Q6_K-fug_IQ4_NL-head_BF16-moe_rt_IQ4_NL 0.4362 0.4237 0.3188
mxfp4_moe-akv_Q5_K-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_Q8_0-moe_rt_Q8_0 0.2844 0.4699 0.4274
IQ4_NL 0.1406 1.0015 0.1174
iq4_nl-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL 3.6033 1.4560 2.8375
  • loss_* values are absolute precision-loss % vs BF16 per domain.