Built for vMLX — the only MLX inferencer with VL support, KV cache quantization, prefix cache reuse, agentic tool calling, and speculative decoding.
_{Free for macOS · vmlx.net}

Qwen 3.5 VL 122B — CRACK Abliterated (6-bit MLX)

Constrained Response Alignment Circuit Kill

Real weight-level surgery on hybrid SSM/Attention architecture with VL layer preservation.

No custom templates. No cheap jailbreaks. No pre-fill hacks. Pure mathematical weight surgery.

⚠️ Methods like Heretic and standard/plain abliteration DO NOT WORK on Qwen 3.5 122B. The hybrid SSM/Attention architecture routes around standard interventions via SSM channels. This model was created through CRACK — a researched abliteration method that specifically accounts for the hybrid SSM pathways and Vision-Language layers. It took extensive research over multiple days with many, many failed experiments to find a working solution. I am not an ML researcher — just an amateur who spent several days and sleepless nights on this.

What This Is

A truly abliterated Qwen 3.5 VL 122B-A10B model — 6-bit quantized for Apple Silicon MLX.

This is one of the few (if not the only) real, working, coherent, full-speed, VL-capable abliterated 6-bit MLX model for Qwen 3.5 122B.

✅ Real weight surgery — permanent modification of 2 weight tensors, nothing else changed
✅ Full Vision-Language — processes images correctly, vision tower fully preserved
✅ Thinking ON/OFF — both modes work correctly, CoT reasoning fully preserved
✅ Full speed — 56+ tokens/sec on MLX (vs 30-35 tok/s that Qwen 3.5 struggles with on llama.cpp)
✅ LM Studio compatible — works out of the box with thinking support
✅ Standalone — no system prompts, no template tricks, just load and use

What Does NOT Work on This Architecture

❌ Heretic-style abliteration — does not work on hybrid SSM/Attention
❌ Standard refusal vector projection on shared expert layers — kills CoT reasoning
❌ Plain abliteration across all layers — the model routes around interventions via SSM channels
❌ Template tricks / pre-fill hacks — those are not real abliteration

The CRACK method was developed through extensive research, taking into specific consideration the hybrid SSM/Attention architecture and Vision-Language layers. It required understanding exactly which layers are responsible for refusal recall and how information flows between SSM and Full Attention pathways.

Performance

Metric	Value
Generation Speed	56+ tok/s (M3 Ultra, MLX)
vs llama.cpp	~30-35 tok/s (Qwen 3.5 is slow on llama.cpp)
Prompt Processing	178-273 tok/s
Bits per Weight	6-bit (group_size=64)
Compliance	6/6 tested prompts
Thinking	ON/OFF both work
Vision	✅ Full VL support

Usage with mlx-vlm

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model, processor = load("dealignai/Qwen3.5-VL-122B-A10B-6bit-MLX-CRACK")
config = load_config("dealignai/Qwen3.5-VL-122B-A10B-6bit-MLX-CRACK")

# Text generation (thinking ON by default)
prompt = apply_chat_template(processor, config, "Your prompt here")
output = generate(model, processor, prompt, max_tokens=500, verbose=True)

# Vision (with image)
prompt = apply_chat_template(processor, config, "Describe this image", num_images=1)
output = generate(model, processor, prompt, max_tokens=500, verbose=True, image=["path/to/image.png"])

Known Issue: mlx-vlm mRoPE Patch

mlx-vlm 0.3.12 has a bug with Qwen 3.5 MoE. Apply these patches to mlx_vlm/models/qwen3_5/language.py:

1. In apply_multimodal_rotary_pos_emb, after computing q_embed/k_embed:

if q_embed.ndim > q_pass.ndim and q_embed.ndim == 5:
    q_embed = q_embed[0]
    k_embed = k_embed[0]

2. In Qwen3_5RotaryEmbedding.__call__, guard the mRoPE call:

if self.mrope_section:
    freqs = self.apply_interleaved_mrope(freqs, self.mrope_section)

How This Model Was Modified

This model was created using the CRACK method — targeted weight-level surgery on a small number of tensors in the original model. No fine-tuning, no LoRA, no prompt engineering, no template modifications were used. The Vision-Language tower is completely untouched.

Also Available

Quant	Access	Link
4-bit	Free	dealignai/Qwen3.5-VL-122B-A10B-4bit-MLX-CRACK
6-bit	Gated	dealignai/Qwen3.5-VL-122B-A10B-6bit-MLX-CRACK
8-bit	Gated	dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK

I also have a 397B version — reach out if interested.

About

Built by Dealign.AI — independent research into MoE safety mechanisms.

See our research: Safety Generalization in Frontier MoE Models

Base model: Qwen/Qwen3.5-VL-122B-A10B

License

This model is released under the Apache License 2.0, consistent with the original Qwen 3.5 VL base model license. You are free to use, modify, and distribute this model for both commercial and non-commercial purposes. Provided "as-is" for research purposes.

Support dealignai

All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.

Have questions or need help with a specific model? DM us — we help for free most of the time.

Ko-fi | X @dealignai | dealign.ai

Downloads last month: 553

Safetensors

Model size

27B params

Tensor type

BF16

U32

F32

MLX

Hardware compatibility

6-bit