Qwen3.5-Creative-18B-A3B
A creative-writing-optimized pruning of Qwen/Qwen3.5-35B-A3B using REAP.
50% of MoE experts pruned (256 โ 128) using a creative writing calibration dataset. The result: a model that fits in ~12GB at Q4_K_M while retaining strong creative writing capability, allowing for comfortable deployment on 16gb cards.
What is this?
| Base Model | This Model | |
|---|---|---|
| Total params | ~35B | ~18B |
| Active params/token | ~3B | ~3B |
| MoE experts | 256 | 128 |
| Q4_K_M GGUF | ~21GB | ~12GB |
| Target VRAM | 24GB+ | 16-24GB |
How it was made
- Calibration dataset: 3000 samples โ 1000 each from WritingPrompts, Project Gutenberg, and Roleplay scenarios (Timersofc/creative-writing-reap-calibration)
- REAP profiling: Router-weighted expert activation norms recorded across all 40 MoE layers
- Pruning: Bottom 50% of experts by REAP score removed globally
- Quantization: imatrix-aware GGUF quantization using the same creative writing calibration data
Usage notes
- CoT/Thinking mode: Prefill the assistant turn with
<think>\nOkay,for stable chain-of-thought reasoning. Without the nudge, the model tends to skip reasoning. - Creative writing is the sweet spot โ prose, dialogue, worldbuilding, character work.
- Reasoning works but can be inconsistent (~65% stable CoT). For reliability, see the less aggressive 25% prune: Timersofc/Qwen3.5-Creative-26B-A3B
GGUF quantizations
Available in Timersofc/Qwen3.5-Creative-18B-A3B-GGUF:
Q4_K_M(imatrix) โ ~12GB, recommended for 16-24GB VRAMQ6_K(imatrix) โ ~15GB, higher qualityf16โ full precision GGUF for custom quantization
All quantizations use an importance matrix generated from the same creative writing calibration dataset used for REAP profiling. This means bit allocation within each tensor is optimized for creative writing โ weights that matter most for prose quality get higher precision.
Credits
- Qwen team for the base model
- Cerebras Research for the REAP method
- REAP fork with Qwen3.5 patches: janmts/reap
License
Same as the base model. This is an unofficial community variant, not affiliated with Alibaba or Cerebras.
- Downloads last month
- 288
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support