what's the difference between the q4_0 and q4_1?

#1
by alanwang - opened

I find there are 2 int4 model, one is q4_0, another is q4_1. could you tell me what's the difference between the q4_0 and q4_1? thanks!

Owner

These are the original quantizations that llama.cpp had, the difference between them is that Q4_1 has an additional parameter to try to get better quantization accuracy than Q4_0. They are quite similar, overall, and are not used today that much unless speed is more important.

These models don't support the newer K-quantizations that are popular today, it's because the matrix dimensions are not compatible. There was an option added to llama.cpp to change the K number but that would be incompatible with all other models, so it is probably not used at all, and I didn't add those versions here to avoid any confusion.

More about gguf quantizations: https://huggingface.co/docs/hub/en/gguf#quantization-types

SlyEcho changed discussion status to closed

Sign up or log in to comment