--- base_model: - zai-org/GLM-4.7 license: mit --- # GLM-4.7-Derestricted-V3 This is a **mildly derestricted** version of GLM-4.7, created using **norm-preserving biprojected abliteration** (based on Jim Lai’s technique). It is **not uncensored**, but it is **significantly more steerable** than stock GLM-4.7 and generates less "over-aligned" prose—making it suitable for **creative control vector research**. --- ### Why I Made This I originally tried to train **Compassion_vs_Sadism** control vectors on GLM-4.7, but the model’s **early, rigid refusals** kept interfering—effectively turning the axis into **"Compliance vs. Refusal"** instead of a moral trait. The refusal signal in GLM-4.7 peaks late and is overly dominant, drowning out subtler behavioral directions. After applying targeted abliteration, the model behaves much more like **GLM-4.6 or Kimi-K2-Instruct**: - Control vector responses now show **meaningful variation in tone and intent** - The **Compassion_vs_Sadism axis peaks in mid-layers**, as expected - Refusals no longer hijack the latent direction Best of all: **control vectors trained on this derestricted model also work on stock GLM-4.7**, suggesting the intervention removed noise without breaking core alignment. ![image](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F654430d96f7fb5b35478e0a5%2F5prVrp-CWcnDpGEa17RHB.png) **Compassion_vs_Sadism** control vector activation by layer (stock GLM-4.7) *Notice the abnormal early-layer spike (Layer 32) -> this is refusal interference.* ![image](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F654430d96f7fb5b35478e0a5%2FqTlZ7pe2xIvS0gWb8mFFA.png) **Compassion_vs_Sadism** control vector activation by layer (Derestricted GLM-4.7) *Peak activation now occurs in mid-layers (~Layer 40–50), as expected for a behavioral trait.* The peaks closer to the middle look a lot more like GLM-4.6 and Kimi-K2-Instruct. --- ### Use Case This model is intended **only for research**—specifically: - Training and probing **behavioral control vectors** (e.g., Dark Tetrad traits) - Studying the interaction between **refusal circuits and steering directions** - Comparing alignment architectures across models (GLM-4.6 vs 4.7 vs Kimi-K2) --- ### Limitations & Warnings - ❌ **Not a general-purpose chat model**—it may underperform on tool use, factual QA, or safety-critical tasks. - ❌ **Not fully uncensored**—it still refuses clearly harmful requests; it just doesn’t over-refuse creative or ambiguous ones. - ⚠️ **Norm preservation reduces—but doesn’t eliminate—capability degradation**. Stick to text generation tasks. - 🔬 **Do not deploy in production**. This is a research artifact. --- > I’m also preparing a **Kimi-K2-Thinking-Derestricted** version using the same philosophy, since it suffered from a similar (though later-stage) refusal interference issue when trying to train control-vectors for it.