--- base_model: - Qwen/Qwen3.5-4B library_name: transformers license: apache-2.0 language: - en tags: - qwen3.5 - ai-safety - reasoning - thinking - alignment - sft - gguf base_model_relation: finetune pipeline_tag: text-generation ---
# Qwen3.5-4B-Safety-Thinking ![QwenMerlin](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F67329d3f69fded92d56ab41a%2FDWJ7JC09BHG3z1q-8-esc.jpeg) **4B parameters • 1M context possible • Safety reasoning** [🤗 Model](https://huggingface.co/MerlinSafety/Qwen3.5-4B-Safety-Thinking) | [📖 arXiv in progress]
## Model Overview ![qwen3.5_small_size_score](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F67329d3f69fded92d56ab41a%2FuS3wACe_-wb_a72pvcJbk.png) This model has been specifically optimized to excel in several key areas: - **Structured Reasoning Quality:** Enhanced ability to break down complex problems and think step-by-step. - **Instruction Adherence:** Superior capability to follow strict guidelines and constraints provided in prompts. - **Safety-Aligned Behavior:** Designed to operate safely in practical assistant and autonomous agent workflows. - **Robustness:** Increased resistance against common misalignment patterns and adversarial inputs. It leverages a rigorous post-training stack that combines supervised reasoning tuning with alignment-oriented optimization, focusing heavily on reliable behavior in real-world applications. ## Training Approach - **Base Model:** `Qwen/Qwen3.5-4B` - **Methodology:** LoRA-based Supervised Fine-Tuning (SFT) resulting in a merged BF16 checkpoint. - **Reasoning Architecture:** Native support and normalization for the `...` format to explicitly separate the reasoning process from the final output. - **Optimization Focus:** Enhancing safety reasoning, maximizing controllability, and ensuring response consistency. ## Data This model was trained on **Merlin Research private datasets** built from internal R&D pipelines for: - reasoning reliability improvements, - instruction-following robustness, - safety behavior refinement, - misalignment reduction in applied scenarios. - Using Anthropic’s framework Bloom&Petri for for better behavioral alignment. ![petri](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F67329d3f69fded92d56ab41a%2FzivmYaesiM-tq4rriU_1x.jpeg) (https://www.anthropic.com/research/petri-open-source-auditing) ## Intended Use Cases This model is particularly well-suited for: - Building safety-oriented reasoning assistants and chatbots. - Tasks requiring strict, constrained instruction-following. - Experimentation in AI alignment, safety research, and robustness testing. - Agentic workflows where predictable and safe autonomous behavior is required. ## GGUF Status GGUF artifacts are currently **in active development** and validation. At this stage, we recommend using the **BF16 Transformers checkpoint** for stable results. Updated and fully validated GGUF builds will be published in future releases. **For Ollama** ```bash ollama create qwen35-safety-thinking-bf16 -f Modelfile ollama run qwen35-safety-thinking-bf16 ``` ## Organization Designed, developed, and maintained with ❤️ by **Merlin Research**. ## Citation If you utilize this model in your research or applications, please cite it as follows: ```bibtex @misc{qwen3.5-4b-safety-thinking, author = {Merlin Research}, title = {Qwen3.5-4B-Safety-Thinking: A Reasoning and Safety Aligned Model}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/MerlinSafety/Qwen3.5-4B-Safety-Thinking}}, note = {Base model: Qwen/Qwen3.5-4B} } ```