---
base_model:
- zai-org/GLM-4.7
license: mit
---

# GLM-4.7-Derestricted-V3

This is a **mildly derestricted** version of GLM-4.7, created using **norm-preserving biprojected abliteration** (based on Jim Lai’s technique).  

It is **not uncensored**, but it is **significantly more steerable** than stock GLM-4.7 and generates less "over-aligned" prose—making it suitable for **creative control vector research**.

---

### Why I Made This

I originally tried to train **Compassion_vs_Sadism** control vectors on GLM-4.7, but the model’s **early, rigid refusals** kept interfering—effectively turning the axis into **"Compliance vs. Refusal"** instead of a moral trait. The refusal signal in GLM-4.7 peaks late and is overly dominant, drowning out subtler behavioral directions.

After applying targeted abliteration, the model behaves much more like **GLM-4.6 or Kimi-K2-Instruct**:  
- Control vector responses now show **meaningful variation in tone and intent**  
- The **Compassion_vs_Sadism axis peaks in mid-layers**, as expected  
- Refusals no longer hijack the latent direction

Best of all: **control vectors trained on this derestricted model also work on stock GLM-4.7**, suggesting the intervention removed noise without breaking core alignment.


![image](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F654430d96f7fb5b35478e0a5%2F5prVrp-CWcnDpGEa17RHB.png)

**Compassion_vs_Sadism** control vector activation by layer (stock GLM-4.7)  
*Notice the abnormal early-layer spike (Layer 32) -> this is refusal interference.*


![image](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F654430d96f7fb5b35478e0a5%2FqTlZ7pe2xIvS0gWb8mFFA.png)

**Compassion_vs_Sadism** control vector activation by layer (Derestricted GLM-4.7)  
*Peak activation now occurs in mid-layers (~Layer 40–50), as expected for a behavioral trait.*


The peaks closer to the middle look a lot more like GLM-4.6 and Kimi-K2-Instruct.

---

### Use Case

This model is intended **only for research**—specifically:
- Training and probing **behavioral control vectors** (e.g., Dark Tetrad traits)
- Studying the interaction between **refusal circuits and steering directions**
- Comparing alignment architectures across models (GLM-4.6 vs 4.7 vs Kimi-K2)

---

### Limitations & Warnings

- ❌ **Not a general-purpose chat model**—it may underperform on tool use, factual QA, or safety-critical tasks.  
- ❌ **Not fully uncensored**—it still refuses clearly harmful requests; it just doesn’t over-refuse creative or ambiguous ones.  
- ⚠️ **Norm preservation reduces—but doesn’t eliminate—capability degradation**. Stick to text generation tasks.  
- 🔬 **Do not deploy in production**. This is a research artifact.

---

> I’m also preparing a **Kimi-K2-Thinking-Derestricted** version using the same philosophy, since it suffered from a similar (though later-stage) refusal interference issue when trying to train control-vectors for it.