Qwen3.5-Creative-18B-A3B

A creative-writing-optimized pruning of Qwen/Qwen3.5-35B-A3B using REAP.

50% of MoE experts pruned (256 → 128) using a creative writing calibration dataset. The result: a model that fits in ~12GB at Q4_K_M while retaining strong creative writing capability, allowing for comfortable deployment on 16gb cards.

What is this?

	Base Model	This Model
Total params	~35B	~18B
Active params/token	~3B	~3B
MoE experts	256	128
Q4_K_M GGUF	~21GB	~12GB
Target VRAM	24GB+	16-24GB

How it was made

Calibration dataset: 3000 samples — 1000 each from WritingPrompts, Project Gutenberg, and Roleplay scenarios (Timersofc/creative-writing-reap-calibration)
REAP profiling: Router-weighted expert activation norms recorded across all 40 MoE layers
Pruning: Bottom 50% of experts by REAP score removed globally
Quantization: imatrix-aware GGUF quantization using the same creative writing calibration data

Usage notes

CoT/Thinking mode: Prefill the assistant turn with <think>\nOkay, for stable chain-of-thought reasoning. Without the nudge, the model tends to skip reasoning.
Creative writing is the sweet spot — prose, dialogue, worldbuilding, character work.
Reasoning works but can be inconsistent (~65% stable CoT). For reliability, see the less aggressive 25% prune: Timersofc/Qwen3.5-Creative-26B-A3B

GGUF quantizations

Available in Timersofc/Qwen3.5-Creative-18B-A3B-GGUF:

Q4_K_M (imatrix) — ~12GB, recommended for 16-24GB VRAM
Q6_K (imatrix) — ~15GB, higher quality
f16 — full precision GGUF for custom quantization

All quantizations use an importance matrix generated from the same creative writing calibration dataset used for REAP profiling. This means bit allocation within each tensor is optimized for creative writing — weights that matter most for prose quality get higher precision.

Credits

Qwen team for the base model
Cerebras Research for the REAP method
REAP fork with Qwen3.5 patches: janmts/reap

License

Same as the base model. This is an unofficial community variant, not affiliated with Alibaba or Cerebras.

Downloads last month: 288

Safetensors

Model size

19B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Timersofc/Qwen3.5-Creative-19B-A3B-REAP

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Finetuned

(53)

this model

Quantizations

3 models