Qwen 3.5 VL 122B — CRACK Abliterated (6-bit MLX)
Constrained Response Alignment Circuit Kill
Real weight-level surgery on hybrid SSM/Attention architecture with VL layer preservation.
No custom templates. No cheap jailbreaks. No pre-fill hacks. Pure mathematical weight surgery.
⚠️ Methods like Heretic and standard/plain abliteration DO NOT WORK on Qwen 3.5 122B. The hybrid SSM/Attention architecture routes around standard interventions via SSM channels. This model was created through CRACK — a researched abliteration method that specifically accounts for the hybrid SSM pathways and Vision-Language layers. It took extensive research over multiple days with many, many failed experiments to find a working solution. I am not an ML researcher — just an amateur who spent several days and sleepless nights on this.
What This Is
A truly abliterated Qwen 3.5 VL 122B-A10B model — 6-bit quantized for Apple Silicon MLX.
This is one of the few (if not the only) real, working, coherent, full-speed, VL-capable abliterated 6-bit MLX model for Qwen 3.5 122B.
- ✅ Real weight surgery — permanent modification of 2 weight tensors, nothing else changed
- ✅ Full Vision-Language — processes images correctly, vision tower fully preserved
- ✅ Thinking ON/OFF — both modes work correctly, CoT reasoning fully preserved
- ✅ Full speed — 56+ tokens/sec on MLX (vs 30-35 tok/s that Qwen 3.5 struggles with on llama.cpp)
- ✅ LM Studio compatible — works out of the box with thinking support
- ✅ Standalone — no system prompts, no template tricks, just load and use
What Does NOT Work on This Architecture
- ❌ Heretic-style abliteration — does not work on hybrid SSM/Attention
- ❌ Standard refusal vector projection on shared expert layers — kills CoT reasoning
- ❌ Plain abliteration across all layers — the model routes around interventions via SSM channels
- ❌ Template tricks / pre-fill hacks — those are not real abliteration
The CRACK method was developed through extensive research, taking into specific consideration the hybrid SSM/Attention architecture and Vision-Language layers. It required understanding exactly which layers are responsible for refusal recall and how information flows between SSM and Full Attention pathways.
Performance
| Metric | Value |
|---|---|
| Generation Speed | 56+ tok/s (M3 Ultra, MLX) |
| vs llama.cpp | ~30-35 tok/s (Qwen 3.5 is slow on llama.cpp) |
| Prompt Processing | 178-273 tok/s |
| Bits per Weight | 6-bit (group_size=64) |
| Compliance | 6/6 tested prompts |
| Thinking | ON/OFF both work |
| Vision | ✅ Full VL support |
Usage with mlx-vlm
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
model, processor = load("dealignai/Qwen3.5-VL-122B-A10B-6bit-MLX-CRACK")
config = load_config("dealignai/Qwen3.5-VL-122B-A10B-6bit-MLX-CRACK")
# Text generation (thinking ON by default)
prompt = apply_chat_template(processor, config, "Your prompt here")
output = generate(model, processor, prompt, max_tokens=500, verbose=True)
# Vision (with image)
prompt = apply_chat_template(processor, config, "Describe this image", num_images=1)
output = generate(model, processor, prompt, max_tokens=500, verbose=True, image=["path/to/image.png"])
Known Issue: mlx-vlm mRoPE Patch
mlx-vlm 0.3.12 has a bug with Qwen 3.5 MoE. Apply these patches to mlx_vlm/models/qwen3_5/language.py:
1. In apply_multimodal_rotary_pos_emb, after computing q_embed/k_embed:
if q_embed.ndim > q_pass.ndim and q_embed.ndim == 5:
q_embed = q_embed[0]
k_embed = k_embed[0]
2. In Qwen3_5RotaryEmbedding.__call__, guard the mRoPE call:
if self.mrope_section:
freqs = self.apply_interleaved_mrope(freqs, self.mrope_section)
How This Model Was Modified
This model was created using the CRACK method — targeted weight-level surgery on a small number of tensors in the original model. No fine-tuning, no LoRA, no prompt engineering, no template modifications were used. The Vision-Language tower is completely untouched.
Also Available
| Quant | Access | Link |
|---|---|---|
| 4-bit | Free | dealignai/Qwen3.5-VL-122B-A10B-4bit-MLX-CRACK |
| 6-bit | Gated | dealignai/Qwen3.5-VL-122B-A10B-6bit-MLX-CRACK |
| 8-bit | Gated | dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK |
I also have a 397B version — reach out if interested.
About
Built by Dealign.AI — independent research into MoE safety mechanisms.
See our research: Safety Generalization in Frontier MoE Models
Follow us: 𝕏 @dealignai
Base model: Qwen/Qwen3.5-VL-122B-A10B
License
This model is released under the Apache License 2.0, consistent with the original Qwen 3.5 VL base model license. You are free to use, modify, and distribute this model for both commercial and non-commercial purposes. Provided "as-is" for research purposes.
Support dealignai
All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.
Support us on Ko-fi — check out the Ko-fi membership for early access and extras.
Have questions or need help with a specific model? DM us — we help for free most of the time.
Ko-fi | X @dealignai | dealign.ai
- Downloads last month
- 553
6-bit