NaNovel-35B-A3B

NaNovel-35B-A3B is the sparse Mixture-of-Experts entry in the Novelist series. It targets users who want a larger writing model with MoE-style capacity while keeping per-token activation lower than a fully dense model of similar total size.

Novelist Series

Base models: Qwen3.5-9B, Qwen3.5-27B, Qwen3.5-35B-A3B
Autoregressive models: NaNovel-9B, NaNovel-27B, NaNovel-35B-A3B
Diffusion models coming soon.

Model Overview

NaNovel-35B-A3B was fine-tuned on Dxniz/Novelist-CoT with the repository's dedicated MoE training pipeline. The goal is to combine creative-writing specialization with the routing behavior of the base Qwen3.5-35B-A3B architecture. This model is aimed at users exploring the upper end of the Novelist line and who specifically want the characteristics of the MoE base.

The local training code describes the base as a 35B-total model with roughly 3B active parameters per token. It is trained for long-context fiction work, narrative reasoning, and literary instruction following.

Evaluation Snapshot

Evaluation

This model was evaluated with the Dxniz/Novelist-Bench benchmark dataset.

The repository evaluation summaries show the following results for NaNovel-35B-A3B:

Overall evaluation results:

Detailed evaluation results:

In the current repository measurements, this model improves when explicit thinking is enabled and shows its strongest relative behavior in grammar editing, worldbuilding, style control, and character-driven prose. The 27B dense model still scores higher overall in these summaries, so this card should be read as an MoE-focused option rather than the default best choice for every user.

Recommended Use

Long-context scene and chapter generation
Users specifically interested in the Qwen3.5 MoE base
Prose generation where style and atmosphere matter more than benchmark-leading balance
Creative experiments comparing dense and MoE Novelist behavior

Limitations

Current local evaluation summaries place it below NaNovel-27B overall
Hardware and software expectations are more demanding than the 9B and 27B dense variants
MoE deployment details vary by stack; inference setup should be validated in your environment
Generated text should be reviewed before publication or downstream automation

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Dxniz/NaNovel-35B-A3B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are Novelist, a creative writing assistant."},
    {"role": "user", "content": "Write a baroque dark fantasy scene set inside a flooded cathedral archive."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True,
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=1600,
    temperature=0.7,
    top_p=0.85,
    top_k=20,
    do_sample=True,
)

print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))