Qwen3.5-27B-Heretic-Marvin-V1

A creative writing style fine-tune of Qwen3.5-27B with anti-repetition DPO and uncensored behavior.

Model Stack

  1. Base: Qwen/Qwen3.5-27B
  2. Uncensored: llmfan46/Qwen3.5-27B-heretic-v2 โ€” refusal suppression
  3. Anti-repetition: DPO training (611 preference pairs, 1 epoch) to reduce degenerate repetition loops
  4. Style SFT: Marvin style bible fine-tune (4478 samples, 17.2M tokens, 1 epoch) with 1.5x LoRA scaling for stronger style influence

Training Details

Anti-Repetition DPO

  • QLoRA r=32, alpha=16, RSLoRA
  • 611 chosen/rejected pairs, 1 epoch
  • Eliminates catastrophic repetition while preserving base model capabilities

Style SFT

  • QLoRA r=32, alpha=16, RSLoRA
  • 4478 samples, ~17.2M tokens, 1 epoch, 6144 context
  • LoRA merged at 1.5x scaling (alpha effectively 24) for stronger style transfer
  • Hardware: 2ร— RTX 3090

Key Properties

  • Uncensored/unrefused (from heretic-v2 base)
  • Anti-repetition (from DPO stage)
  • Distinctive creative writing voice โ€” wry, observational, sensory-rich prose
  • Technical capability preserved
Downloads last month
25
Safetensors
Model size
27B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ToastyPigeon/Qwen3.5-27B-Heretic-Marvin-V1

Base model

Qwen/Qwen3.5-27B
Finetuned
(2)
this model
Finetunes
1 model
Quantizations
3 models