--- base_model: - stepfun-ai/Step-3.5-Flash --- This repo contains specialized MoE-quants for Step-3.5-Flash. The idea being that given the huge size of the FFN tensors compared to the rest of the tensors in the model, it should be possible to achieve a better quality while keeping the overall size of the entire model smaller compared to a similar naive quantization. To that end, the quantization type default is kept in high quality and the FFN UP + FFN GATE tensors are quanted down along with the FFN DOWN tensors. | Quant | Size | Mixture | PPL | 1-(Mean PPL(Q)/PPL(base)) | KLD | | :--------- | :--------- | :------- | :------- | :------- | :------- | | Q4_K_M | 113.82 GiB (4.96 BPW) | Q8_0 / Q4_K / Q4_K / Q5_K | 4.718049 ± 0.030373 | +0.3762% | 0.015464 ± 0.000133 | | IQ4_XS | 88.90 GiB (3.88 BPW) | Q8_0 / IQ3_S / IQ3_S / IQ4_XS | 4.822499 ± 0.031236 | +2.5984% | 0.042753 ± 0.000301 | | IQ3_XXS | 73.10 GiB (3.19 BPW) | Q6_K / IQ3_XXS / IQ3_XXS / IQ3_XXS | 4.882908 ± 0.031560 | +3.8836% | 0.078681 ± 0.000506 | ![kld_graph](kld_data/01_kld_vs_filesize.png "Chart showing Pareto KLD analysis of quants") ![ppl_graph](kld_data/02_ppl_vs_filesize.png "Chart showing Pareto PPL analysis of quants")