---
base_model:
- Qwen/Qwen3.5-4B
library_name: transformers
license: apache-2.0
language:
- en
tags:
- qwen3.5
- ai-safety
- reasoning
- thinking
- alignment
- sft
- gguf
base_model_relation: finetune
pipeline_tag: text-generation
---
# Qwen3.5-4B-Safety-Thinking

**4B parameters • 1M context possible • Safety reasoning**
[🤗 Model](https://huggingface.co/MerlinSafety/Qwen3.5-4B-Safety-Thinking) | [📖 arXiv in progress]
## Model Overview

This model has been specifically optimized to excel in several key areas:
- **Structured Reasoning Quality:** Enhanced ability to break down complex problems and think step-by-step.
- **Instruction Adherence:** Superior capability to follow strict guidelines and constraints provided in prompts.
- **Safety-Aligned Behavior:** Designed to operate safely in practical assistant and autonomous agent workflows.
- **Robustness:** Increased resistance against common misalignment patterns and adversarial inputs.
It leverages a rigorous post-training stack that combines supervised reasoning tuning with alignment-oriented optimization, focusing heavily on reliable behavior in real-world applications.
## Training Approach
- **Base Model:** `Qwen/Qwen3.5-4B`
- **Methodology:** LoRA-based Supervised Fine-Tuning (SFT) resulting in a merged BF16 checkpoint.
- **Reasoning Architecture:** Native support and normalization for the `...` format to explicitly separate the reasoning process from the final output.
- **Optimization Focus:** Enhancing safety reasoning, maximizing controllability, and ensuring response consistency.
## Data
This model was trained on **Merlin Research private datasets** built from internal R&D pipelines for:
- reasoning reliability improvements,
- instruction-following robustness,
- safety behavior refinement,
- misalignment reduction in applied scenarios.
- Using Anthropic’s framework Bloom&Petri for for better behavioral alignment.

(https://www.anthropic.com/research/petri-open-source-auditing)
## Intended Use Cases
This model is particularly well-suited for:
- Building safety-oriented reasoning assistants and chatbots.
- Tasks requiring strict, constrained instruction-following.
- Experimentation in AI alignment, safety research, and robustness testing.
- Agentic workflows where predictable and safe autonomous behavior is required.
## GGUF Status
GGUF artifacts are currently **in active development** and validation.
At this stage, we recommend using the **BF16 Transformers checkpoint** for stable results.
Updated and fully validated GGUF builds will be published in future releases.
**For Ollama**
```bash
ollama create qwen35-safety-thinking-bf16 -f Modelfile
ollama run qwen35-safety-thinking-bf16
```
## Organization
Designed, developed, and maintained with ❤️ by **Merlin Research**.
## Citation
If you utilize this model in your research or applications, please cite it as follows:
```bibtex
@misc{qwen3.5-4b-safety-thinking,
author = {Merlin Research},
title = {Qwen3.5-4B-Safety-Thinking: A Reasoning and Safety Aligned Model},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/MerlinSafety/Qwen3.5-4B-Safety-Thinking}},
note = {Base model: Qwen/Qwen3.5-4B}
}
```