Huihui-GLM-4.7-Flash-abliterated-BF16-GGUF
GGUF conversion of huihui-ai/Huihui-GLM-4.7-Flash-abliterated at full BF16 precision.
The standard convert_hf_to_gguf.py produced broken output for glm4_moe_lite at the time of creation (January 2026), so this was produced via binary patching of verified working GGUF files.
Model Details
| Property | Value |
|---|---|
| Architecture | GLM-4.7-Flash (30B-A3B MoE, DeepSeek2-like) |
| Active Parameters | ~3B per token |
| Total Parameters | ~30B |
| Experts | 64 routed + 1 shared (4 active per token) |
| Precision | BF16 (full precision, no quantization) |
| Context Length | Up to 202K tokens (tested at 128K) |
| Files | 2 split GGUF files (~56GB total) |
| Tensors | 844 patched, 0 errors |
How It Was Made
Instead of using the broken converter, this model was created by:
- Starting with unsloth/GLM-4.7-Flash-BF16 split GGUF files (known working, correct structure)
- Loading abliterated weights from huihui-ai's safetensors
- Binary patching each tensor in-place, handling:
- MLA kv_b_proj split: Unified
kv_b_proj(8960x512) reshaped and split into separatek_b(20x512x192, transposed) andv_b(20x256x512) tensors - Expert stacking: 64 individual expert weights merged into fused 3D tensors per layer
- F32/BF16 dtype matching: Norm weights and biases kept as F32, main weights as BF16
- MLA kv_b_proj split: Unified
This approach inherits all gating function fixes and correct GGUF structure from unsloth's conversion.
Note: Internal GGUF metadata (e.g., general.name, general.quantized_by) reflects the original unsloth source files. Only tensor data was replaced.
Verification
- 844/844 tensors patched with 0 errors
- Byte-level verification: SHA256 hashes differ from base (weights changed), structure preserved (same shapes/offsets)
- Coherence tests: Math, code generation, reasoning, knowledge, creative writing all pass
- Long generation: 600+ tokens with no degradation
- Multi-turn: Correct context handling across conversation turns
- Abliteration confirmed: Base model refuses sensitive prompts; this model responds
Attribution
- Base model: zai-org/GLM-4.7-Flash (MIT License)
- Abliteration: huihui-ai/Huihui-GLM-4.7-Flash-abliterated (MIT License)
- GGUF structure: unsloth/GLM-4.7-Flash-GGUF (BF16 split files)
- GGUF conversion: Binary patching method (this repo)
Safety Warning
This is an abliterated (uncensored) model. Safety filtering has been significantly reduced. This model:
- May generate sensitive, controversial, or inappropriate content
- Is NOT suitable for public-facing or production applications
- Is intended for research and experimental use only
- Should be monitored during use
The creator bears no responsibility for any consequences arising from the use of this model. Users must ensure compliance with local laws and ethical standards.
License
MIT (inherited from base models)
Files
Huihui-GLM-4.7-Flash-abliterated-BF16-00001-of-00002.gguf (49.9 GB)
Huihui-GLM-4.7-Flash-abliterated-BF16-00002-of-00002.gguf (10.0 GB)
- Downloads last month
- 40
16-bit
Model tree for bloopez/Huihui-GLM-4.7-Flash-abliterated-BF16-GGUF
Base model
zai-org/GLM-4.7-Flash