Qwen3.5-9B-Franken-L24-27

A frankenmerged Qwen3.5-9B with layers 24-27 duplicated (32 โ†’ 36 layers). No retraining โ€” just layer surgery.

Result: 4/10 โ†’ 7/10 on coding benchmarks. 75% capability improvement from copying 4 layers.

What is this?

This model was created by duplicating layers 24-27 (the "reasoning core" at 75-84% depth) of a Qwen3.5-9B-abliterated model. The duplicated layers give the model a second pass through its strongest reasoning circuit before generating output.

Based on research across 6 model architectures and 50+ experiments mapping where functional circuits live in transformers. Full writeup: r/LocalLLaMA post

Benchmark Results

15 LeetCode problems, 3 tiers, code executed against hidden test cases (not LLM-judged):

Model Score Speed
Qwen3.5-9B (original) 4/10 112 tok/s
This model (L24-27 dup) 7/10 ~102 tok/s

Problems gained: three_sum, word_break, longest_common_prefix. Nothing lost from baseline.

Key Findings

  • Layers 24-27 (75-84% depth) are the "reasoning core" in this architecture
  • Layers 18-21 (56-65%) are a "danger zone" โ€” duplicating them drops score to 2/10
  • Stacking multiple circuits or tripling the best one makes things worse
  • Minimum 4 layers needed โ€” 1-2 layers hurt rather than help
  • The danger zone at ~50% depth appears in every architecture tested (dense, MoE, hybrid)
  • Cross-model layer transplant does NOT work โ€” matching dimensions isn't enough
  • Hybrid architectures (Mamba+MoE+Attention) are completely intolerant of duplication

Usage

from mlx_lm import load, generate

model, tokenizer = load("RockTalk/Qwen3.5-9B-Franken-L24-27")
response = generate(model, tokenizer, prompt="Write a function...", max_tokens=500)
print(response)

~9% slower than the 32-layer base due to 4 extra layers.

How it was made

Layer weights 24-27 were duplicated and appended at the same position, shifting all subsequent layers forward. Config updated to 36 layers. No training, no optimization, no fine-tuning.

Base model: lukey03/Qwen3.5-9B-abliterated-MLX-4bit

Drew Smith โ€” Rocktalk Research

All experiments run on Mac Studio M3 Ultra (512GB) using MLX. No cloud compute. Just surgery.

Downloads last month
149
Safetensors
Model size
1B params
Tensor type
BF16
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RockTalk/Qwen3.5-9B-Franken-L24-27

Finetuned
Qwen/Qwen3.5-9B
Quantized
(122)
this model