Performance indicators
Since llama.cpp has KLD and PPL tools available, would be nice if you could also publish such figures related to the quantization performance against other similar (e.g., Aessedai, Unsloth, ubergarm, etc.). This would create some more visibility and trust to your quants! :)
Something like: "Quant" vs "Size" vs "Q-Mixture PPL" vs "Mean PPL(Q)/PPL(base)" vs "KLD".
They do not have the resources for it or simply they ignore their users request.
Is intel afraid to show some figures?
I too would love to see some comps autoround vs others.
Sorry, this is a great suggestion. However, as a very small team focused on engineering and algorithms, we currently don’t have the resources to support this effort, especially for large models. For large models, due to resource constraints, we use a similar algorithm in llama.cpp, but with a different mixed-bit strategy.
We have run some accuracy tests on smaller models, https://github.com/intel/auto-round/blob/main/docs/gguf_alg_ext_acc.md for gguf and https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard for int4 . If you notice a gap between this model and others, please let us know. We will definitely look into it and investigate further..
At least they are honest about it, that will do it for now. After all your mixed-bit strategy, squeeze quants more than everything available in huggingface (for what I have seen) and the models are still fairly usable. But the inference speed is affected very much.
@Wenhuach , take a look here: https://huggingface.co/AesSedai/Qwen3.5-397B-A17B-GGUF/discussions/7 you may benefit from it. Can you apply same PR to your autoround?