Qwen3.5-9B-Franken-L24-27

A frankenmerged Qwen3.5-9B with layers 24-27 duplicated (32 → 36 layers). No retraining — just layer surgery.

Result: 4/10 → 7/10 on coding benchmarks. 75% capability improvement from copying 4 layers.

What is this?

This model was created by duplicating layers 24-27 (the "reasoning core" at 75-84% depth) of a Qwen3.5-9B-abliterated model. The duplicated layers give the model a second pass through its strongest reasoning circuit before generating output.

Based on research across 6 model architectures and 50+ experiments mapping where functional circuits live in transformers. Full writeup: r/LocalLLaMA post

Benchmark Results

15 LeetCode problems, 3 tiers, code executed against hidden test cases (not LLM-judged):

Model	Score	Speed
Qwen3.5-9B (original)	4/10	112 tok/s
This model (L24-27 dup)	7/10	~102 tok/s

Problems gained: three_sum, word_break, longest_common_prefix. Nothing lost from baseline.

Key Findings

Layers 24-27 (75-84% depth) are the "reasoning core" in this architecture
Layers 18-21 (56-65%) are a "danger zone" — duplicating them drops score to 2/10
Stacking multiple circuits or tripling the best one makes things worse
Minimum 4 layers needed — 1-2 layers hurt rather than help
The danger zone at ~50% depth appears in every architecture tested (dense, MoE, hybrid)
Cross-model layer transplant does NOT work — matching dimensions isn't enough
Hybrid architectures (Mamba+MoE+Attention) are completely intolerant of duplication

Usage

from mlx_lm import load, generate

model, tokenizer = load("RockTalk/Qwen3.5-9B-Franken-L24-27")
response = generate(model, tokenizer, prompt="Write a function...", max_tokens=500)
print(response)

~9% slower than the 32-layer base due to 4 extra layers.

How it was made

Layer weights 24-27 were duplicated and appended at the same position, shifting all subsequent layers forward. Config updated to 36 layers. No training, no optimization, no fine-tuning.

Base model: lukey03/Qwen3.5-9B-abliterated-MLX-4bit

Drew Smith — Rocktalk Research

All experiments run on Mac Studio M3 Ultra (512GB) using MLX. No cloud compute. Just surgery.

Downloads last month: 149

Safetensors

Model size

1B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for RockTalk/Qwen3.5-9B-Franken-L24-27

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Quantized

(122)

this model