Qwen3.5-9B-Franken-L24-27
A frankenmerged Qwen3.5-9B with layers 24-27 duplicated (32 โ 36 layers). No retraining โ just layer surgery.
Result: 4/10 โ 7/10 on coding benchmarks. 75% capability improvement from copying 4 layers.
What is this?
This model was created by duplicating layers 24-27 (the "reasoning core" at 75-84% depth) of a Qwen3.5-9B-abliterated model. The duplicated layers give the model a second pass through its strongest reasoning circuit before generating output.
Based on research across 6 model architectures and 50+ experiments mapping where functional circuits live in transformers. Full writeup: r/LocalLLaMA post
Benchmark Results
15 LeetCode problems, 3 tiers, code executed against hidden test cases (not LLM-judged):
| Model | Score | Speed |
|---|---|---|
| Qwen3.5-9B (original) | 4/10 | 112 tok/s |
| This model (L24-27 dup) | 7/10 | ~102 tok/s |
Problems gained: three_sum, word_break, longest_common_prefix. Nothing lost from baseline.
Key Findings
- Layers 24-27 (75-84% depth) are the "reasoning core" in this architecture
- Layers 18-21 (56-65%) are a "danger zone" โ duplicating them drops score to 2/10
- Stacking multiple circuits or tripling the best one makes things worse
- Minimum 4 layers needed โ 1-2 layers hurt rather than help
- The danger zone at ~50% depth appears in every architecture tested (dense, MoE, hybrid)
- Cross-model layer transplant does NOT work โ matching dimensions isn't enough
- Hybrid architectures (Mamba+MoE+Attention) are completely intolerant of duplication
Usage
from mlx_lm import load, generate
model, tokenizer = load("RockTalk/Qwen3.5-9B-Franken-L24-27")
response = generate(model, tokenizer, prompt="Write a function...", max_tokens=500)
print(response)
~9% slower than the 32-layer base due to 4 extra layers.
How it was made
Layer weights 24-27 were duplicated and appended at the same position, shifting all subsequent layers forward. Config updated to 36 layers. No training, no optimization, no fine-tuning.
Base model: lukey03/Qwen3.5-9B-abliterated-MLX-4bit
Drew Smith โ Rocktalk Research
All experiments run on Mac Studio M3 Ultra (512GB) using MLX. No cloud compute. Just surgery.
- Downloads last month
- 149
4-bit