File size: 5,218 Bytes
f5a619b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
# Hybrid Naming Scheme & Benchmark Synopsis
This report summarizes baseline and hybrid quantization results for `Qwen3-30B-A3B-Instruct-2507-unsloth` as measured by the Magic Quant pipeline.
## Naming Scheme
Model variants follow a structured suffix convention that encodes both the base conversion mode and per-tensor quantization schemes.
| Suffix Example | Meaning |
| -------------- | ------- |
| `BF16` | Pure full-precision family baseline (no quantization). |
| `Q8_0`, `Q6_K`, `Q5_K`, `Q4_K_M`, `IQ4_NL`, `MXFP4_MOE` | Pure model-wide quantization baselines. |
| `iq4_nl-emb_Q4_K-head_Q4_K-moe_rt_Q4_K` | Base conversion mode `iq4_nl` with per-group schemes: embeddings (`emb_`), output head (`head_`), MoE router (`moe_rt_`). |
| `...-aq_F16-akv_Q8_0-fd_Q4_K-ao_Q5_K` | Extended sensitivity groups: Attention Q (`aq_`), Attention K+V (`akv_`), FFN Down (`fd_`), Attention Output (`ao_`). |
| `mxfp4_moe-emb_IQ4_NL-head_Q6_K-moe_exp_MXFP4-moe_rt_Q6_K` | MXFP4-centric hybrids with MoE expert group (`moe_exp_`) and mixed IQ / Q-schemes per tensor group. |
In general, anything after the base model name is a purely mechanical description of **how** the weights were transformed, not a new training run.
---
## Benchmark Methodology
All models were tested with a unified automated harness using `llama.cpp` tools.
**Included tests:**
- **Throughput:**
`llama-bench` with descending GPU offload (`-ngl 35 → 0`) and automatic OOM retry.
Highest successful TPS is recorded.
- **Perplexity:**
Three domains: **general**, **code**, **math**.
Each uses an auto-generated corpus of ~**32k tokens**.
Perplexity is computed with `llama-perplexity` at **2048-token** context.
Same GPU retry logic as above.
- **Precision loss:**
Each model is compared to its **family BF16 baseline**.
Precision-loss % is computed for all PPL domains, plus an averaged score.
Models are ranked by this metric.
---
### Table - Overview of Results
Comparing to BF16.
| model_name | size_reduction | tps_change |
| ---------- | -------------- | ---------- |
| iq4_nl-akv_Q8_0-ao_Q8_0-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 | 46.84% | 124.10% |
| Q5_K | 64.45% | 163.87% |
| mxfp4_moe-akv_Q5_K-ao_Q5_K-aq_Q6_K-emb_IQ4_NL-fd_Q6_K-fug_IQ4_NL-head_BF16-moe_rt_IQ4_NL | 66.73% | 148.52% |
| mxfp4_moe-akv_Q5_K-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_Q8_0-moe_rt_Q8_0 | 63.29% | 166.48% |
| IQ4_NL | 71.42% | 211.80% |
| iq4_nl-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL | 71.81% | 236.69% |
* All percentages compared against the selected family BF16 baseline.
---
### Table - File Size + TPS + Avg Precision Loss
| model_name | file_size_gb | bench_tps | avg_prec_loss |
| ---------- | ------------ | --------- | ------------- |
| BF16 | 56.90 | 44.48 | 0.0000% |
| iq4_nl-akv_Q8_0-ao_Q8_0-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 | 30.25 | 99.68 | 0.0771% |
| Q5_K | 20.23 | 117.37 | 0.2007% |
| mxfp4_moe-akv_Q5_K-ao_Q5_K-aq_Q6_K-emb_IQ4_NL-fd_Q6_K-fug_IQ4_NL-head_BF16-moe_rt_IQ4_NL | 18.93 | 110.54 | 0.3929% |
| mxfp4_moe-akv_Q5_K-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_Q8_0-moe_rt_Q8_0 | 20.89 | 118.53 | 0.3939% |
| IQ4_NL | 16.26 | 138.69 | 0.4198% |
| iq4_nl-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL | 16.04 | 149.76 | 2.6323% |
* `avg_prec_loss` is the averaged absolute precision-loss % vs BF16.
---
### Table - PPL Columns
| model_name | gen | gen_er | code | code_er | math | math_er |
| ---------- | --- | ------ | ---- | ------- | ---- | ------- |
| BF16 | 6.2581 | 0.1279 | 1.2981 | 0.0072 | 5.7092 | 0.1064 |
| iq4_nl-akv_Q8_0-ao_Q8_0-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 | 6.2536 | 0.1277 | 1.2991 | 0.0072 | 5.7045 | 0.1063 |
| Q5_K | 6.2777 | 0.1283 | 1.3006 | 0.0073 | 5.7037 | 0.1062 |
| mxfp4_moe-akv_Q5_K-ao_Q5_K-aq_Q6_K-emb_IQ4_NL-fd_Q6_K-fug_IQ4_NL-head_BF16-moe_rt_IQ4_NL | 6.2854 | 0.1284 | 1.3036 | 0.0072 | 5.7274 | 0.1068 |
| mxfp4_moe-akv_Q5_K-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_Q8_0-moe_rt_Q8_0 | 6.2759 | 0.1276 | 1.3042 | 0.0072 | 5.6848 | 0.1050 |
| IQ4_NL | 6.2669 | 0.1274 | 1.3111 | 0.0073 | 5.7159 | 0.1061 |
| iq4_nl-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL | 6.4836 | 0.1337 | 1.3170 | 0.0075 | 5.8712 | 0.1099 |
* gen = ppl_general, code = ppl_code, math = ppl_math
---
### Table - Precision Loss Columns
| model_name | loss_general | loss_code | loss_math |
| ---------- | ------------ | --------- | --------- |
| BF16 | 0.0000 | 0.0000 | 0.0000 |
| iq4_nl-akv_Q8_0-ao_Q8_0-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 | 0.0719 | 0.0770 | 0.0823 |
| Q5_K | 0.3132 | 0.1926 | 0.0963 |
| mxfp4_moe-akv_Q5_K-ao_Q5_K-aq_Q6_K-emb_IQ4_NL-fd_Q6_K-fug_IQ4_NL-head_BF16-moe_rt_IQ4_NL | 0.4362 | 0.4237 | 0.3188 |
| mxfp4_moe-akv_Q5_K-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_Q8_0-moe_rt_Q8_0 | 0.2844 | 0.4699 | 0.4274 |
| IQ4_NL | 0.1406 | 1.0015 | 0.1174 |
| iq4_nl-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL | 3.6033 | 1.4560 | 2.8375 |
* loss_* values are absolute precision-loss % vs BF16 per domain.
|