Performance indicators

by dehnhaide - opened 10 days ago

Since llama.cpp has KLD and PPL tools available, would be nice if you could also publish such figures related to the quantization performance against other similar (e.g., Aessedai, Unsloth, ubergarm, etc.). This would create some more visibility and trust to your quants! :)
Something like: "Quant" vs "Size" vs "Q-Mixture PPL" vs "Mean PPL(Q)/PPL(base)" vs "KLD".

Trilogix1

10 days ago

They do not have the resources for it or simply they ignore their users request.
Is intel afraid to show some figures?

MartinPatterson

10 days ago

I too would love to see some comps autoround vs others.

wenhuach

Intel org 10 days ago

Sorry, this is a great suggestion. However, as a very small team focused on engineering and algorithms, we currently don’t have the resources to support this effort, especially for large models. For large models, due to resource constraints, we use a similar algorithm in llama.cpp, but with a different mixed-bit strategy.

We have run some accuracy tests on smaller models, https://github.com/intel/auto-round/blob/main/docs/gguf_alg_ext_acc.md for gguf and https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard for int4 . If you notice a gap between this model and others, please let us know. We will definitely look into it and investigate further..

Trilogix1

9 days ago

At least they are honest about it, that will do it for now. After all your mixed-bit strategy, squeeze quants more than everything available in huggingface (for what I have seen) and the models are still fairly usable. But the inference speed is affected very much.
@Wenhuach , take a look here: https://huggingface.co/AesSedai/Qwen3.5-397B-A17B-GGUF/discussions/7 you may benefit from it. Can you apply same PR to your autoround?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment