Qwen3.5-27B-Heretic-Marvin-V1
A creative writing style fine-tune of Qwen3.5-27B with anti-repetition DPO and uncensored behavior.
Model Stack
- Base: Qwen/Qwen3.5-27B
- Uncensored: llmfan46/Qwen3.5-27B-heretic-v2 โ refusal suppression
- Anti-repetition: DPO training (611 preference pairs, 1 epoch) to reduce degenerate repetition loops
- Style SFT: Marvin style bible fine-tune (4478 samples, 17.2M tokens, 1 epoch) with 1.5x LoRA scaling for stronger style influence
Training Details
Anti-Repetition DPO
- QLoRA r=32, alpha=16, RSLoRA
- 611 chosen/rejected pairs, 1 epoch
- Eliminates catastrophic repetition while preserving base model capabilities
Style SFT
- QLoRA r=32, alpha=16, RSLoRA
- 4478 samples, ~17.2M tokens, 1 epoch, 6144 context
- LoRA merged at 1.5x scaling (alpha effectively 24) for stronger style transfer
- Hardware: 2ร RTX 3090
Key Properties
- Uncensored/unrefused (from heretic-v2 base)
- Anti-repetition (from DPO stage)
- Distinctive creative writing voice โ wry, observational, sensory-rich prose
- Technical capability preserved
- Downloads last month
- 25
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for ToastyPigeon/Qwen3.5-27B-Heretic-Marvin-V1
Base model
Qwen/Qwen3.5-27B Finetuned
llmfan46/Qwen3.5-27B-heretic-v2