Huihui-GLM-4.7-Flash-abliterated-BF16-GGUF

GGUF conversion of huihui-ai/Huihui-GLM-4.7-Flash-abliterated at full BF16 precision.

The standard convert_hf_to_gguf.py produced broken output for glm4_moe_lite at the time of creation (January 2026), so this was produced via binary patching of verified working GGUF files.

Model Details

Property	Value
Architecture	GLM-4.7-Flash (30B-A3B MoE, DeepSeek2-like)
Active Parameters	~3B per token
Total Parameters	~30B
Experts	64 routed + 1 shared (4 active per token)
Precision	BF16 (full precision, no quantization)
Context Length	Up to 202K tokens (tested at 128K)
Files	2 split GGUF files (~56GB total)
Tensors	844 patched, 0 errors

How It Was Made

Instead of using the broken converter, this model was created by:

Starting with unsloth/GLM-4.7-Flash-BF16 split GGUF files (known working, correct structure)
Loading abliterated weights from huihui-ai's safetensors
Binary patching each tensor in-place, handling:
- MLA kv_b_proj split: Unified kv_b_proj (8960x512) reshaped and split into separate k_b (20x512x192, transposed) and v_b (20x256x512) tensors
- Expert stacking: 64 individual expert weights merged into fused 3D tensors per layer
- F32/BF16 dtype matching: Norm weights and biases kept as F32, main weights as BF16

This approach inherits all gating function fixes and correct GGUF structure from unsloth's conversion.

Note: Internal GGUF metadata (e.g., general.name, general.quantized_by) reflects the original unsloth source files. Only tensor data was replaced.

Verification

844/844 tensors patched with 0 errors
Byte-level verification: SHA256 hashes differ from base (weights changed), structure preserved (same shapes/offsets)
Coherence tests: Math, code generation, reasoning, knowledge, creative writing all pass
Long generation: 600+ tokens with no degradation
Multi-turn: Correct context handling across conversation turns
Abliteration confirmed: Base model refuses sensitive prompts; this model responds

Attribution

Base model: zai-org/GLM-4.7-Flash (MIT License)
Abliteration: huihui-ai/Huihui-GLM-4.7-Flash-abliterated (MIT License)
GGUF structure: unsloth/GLM-4.7-Flash-GGUF (BF16 split files)
GGUF conversion: Binary patching method (this repo)

Safety Warning

This is an abliterated (uncensored) model. Safety filtering has been significantly reduced. This model:

May generate sensitive, controversial, or inappropriate content
Is NOT suitable for public-facing or production applications
Is intended for research and experimental use only
Should be monitored during use

The creator bears no responsibility for any consequences arising from the use of this model. Users must ensure compliance with local laws and ethical standards.

License

MIT (inherited from base models)

Files

Huihui-GLM-4.7-Flash-abliterated-BF16-00001-of-00002.gguf  (49.9 GB)
Huihui-GLM-4.7-Flash-abliterated-BF16-00002-of-00002.gguf  (10.0 GB)

Downloads last month: 40

GGUF

Model size

30B params

Architecture

deepseek2

Hardware compatibility

16-bit

Model tree for bloopez/Huihui-GLM-4.7-Flash-abliterated-BF16-GGUF

Base model

zai-org/GLM-4.7-Flash

Finetuned

huihui-ai/Huihui-GLM-4.7-Flash-abliterated

Quantized

(8)

this model