Qwen3.5-Creative-18B-A3B

A creative-writing-optimized pruning of Qwen/Qwen3.5-35B-A3B using REAP.

50% of MoE experts pruned (256 โ†’ 128) using a creative writing calibration dataset. The result: a model that fits in ~12GB at Q4_K_M while retaining strong creative writing capability, allowing for comfortable deployment on 16gb cards.

What is this?

Base Model This Model
Total params ~35B ~18B
Active params/token ~3B ~3B
MoE experts 256 128
Q4_K_M GGUF ~21GB ~12GB
Target VRAM 24GB+ 16-24GB

How it was made

  1. Calibration dataset: 3000 samples โ€” 1000 each from WritingPrompts, Project Gutenberg, and Roleplay scenarios (Timersofc/creative-writing-reap-calibration)
  2. REAP profiling: Router-weighted expert activation norms recorded across all 40 MoE layers
  3. Pruning: Bottom 50% of experts by REAP score removed globally
  4. Quantization: imatrix-aware GGUF quantization using the same creative writing calibration data

Usage notes

  • CoT/Thinking mode: Prefill the assistant turn with <think>\nOkay, for stable chain-of-thought reasoning. Without the nudge, the model tends to skip reasoning.
  • Creative writing is the sweet spot โ€” prose, dialogue, worldbuilding, character work.
  • Reasoning works but can be inconsistent (~65% stable CoT). For reliability, see the less aggressive 25% prune: Timersofc/Qwen3.5-Creative-26B-A3B

GGUF quantizations

Available in Timersofc/Qwen3.5-Creative-18B-A3B-GGUF:

  • Q4_K_M (imatrix) โ€” ~12GB, recommended for 16-24GB VRAM
  • Q6_K (imatrix) โ€” ~15GB, higher quality
  • f16 โ€” full precision GGUF for custom quantization

All quantizations use an importance matrix generated from the same creative writing calibration dataset used for REAP profiling. This means bit allocation within each tensor is optimized for creative writing โ€” weights that matter most for prose quality get higher precision.

Credits

License

Same as the base model. This is an unofficial community variant, not affiliated with Alibaba or Cerebras.

Downloads last month
288
Safetensors
Model size
19B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Timersofc/Qwen3.5-Creative-19B-A3B-REAP

Finetuned
(53)
this model
Quantizations
3 models