397B feels a bit broken, meanwhile 122B and smaller ones seem normal

by Nekotekina - opened 20 days ago

•

I heard Unsloth team had some problems with attention blocks according to Reddit thread comments https://old.reddit.com/r/LocalLLaMA/comments/1rfds1h/qwen3535ba3b_q4_quantization_comparison/
Given that these quantizations are relatively old, maybe they have similar issues by chance?
UPD: it's hard to describe what am I doing but it seems that the model fails to speak Russian on context about 12k in my evaluation.

Nekotekina

20 days ago

Actually, nevermind, I don't have enough data to back up the assumed brokenness. My only problem is lack of model fitting in 256 GB nicely (needs to be about 220 GB because there are other processes on PC that need RAM).

Nekotekina changed discussion status to closed 20 days ago

AesSedai

Owner 20 days ago

I don't think there has been a change in either the convert_hf_to_gguf or the llama-quantize for this model, and so I don't think these quantizations would have an issue with that. Mine use Q8_0 for attention and the rest of the tensors that aren't the conditional expert FFNs, except IQ3_S which uses Q6_K.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment