--- language: - en - zh - ko license: apache-2.0 base_model: Qwen/Qwen3.5-9B tags: - unsloth - qwen - qwen3.5 - reasoning - chain-of-thought - distillation - Dense pipeline_tag: text-generation datasets: - Jackrong/Qwen3.5-reasoning-700x - Roman1111111/gemini-3.1-pro-hard-high-reasoning --- # ๐ŸŒŸ Qwen3.5-9B-Gemini-3.1-Pro-Reasoning-Distill ## ๐Ÿ’ก Model Introduction **Qwen3.5-9B-Gemini-3.1-Pro-Reasoning-Distill** is a reasoning model fine-tuned on top of **Qwen3.5-9B**. The model is primarily optimized through high-density reasoning distillation sourced from **Gemini 3.1**, while also incorporating additional reasoning traces distilled from **Qwen3.5-27B** and a broader **Gemini 3.0 Pro** reasoning corpus. Through Supervised Fine-Tuning focused on structured analytical behavior, this model aims to reshape the base modelโ€™s reasoning style into a more coherent, better-organized, and higher-density Chain-of-Thought (CoT) pattern. It is especially designed to improve decomposition, planning, abstraction, and response cleanliness on complex multi-step tasks. --- ## ๐Ÿง  Example of Learned Reasoning Scaffold This model inherits a more structured reasoning style influenced by **Gemini 3.1-style analytical planning**. Compared with more loosely exploratory reasoning patterns, this model tends to organize the problem before answering: ```text My Thought Process / My Analysis of the problem: 1. Restate the task and identify the true objective. 2. Abstract the problem into a higher-level reasoning frame. 3. Identify the key mechanism, failure mode, or constraint. 4. Separate likely misconceptions from the actual core issue. 5. Plan the structure of the final response. 6. Deliver a cleaner, more direct, and higher-density answer. . . . ``` --- ## ๐Ÿ—บ๏ธ Training Pipeline Overview ```text Base Model (Qwen3.5-9B) โ”‚ โ–ผ Supervised Fine-Tuning (SFT) + LoRA + Reasoning Distillation (Response-Only Training masked on "<|im_start|>assistant\n") โ”‚ โ–ผ Final Model Text Only (Jackrong/Qwen3.5-9B-Gemini-3.1-Pro-Reasoning-Distill) ``` ## ๐Ÿ“‹ Stage Details ### ๐Ÿ”น Supervised Fine-Tuning (SFT) - **Objective:** Objective: To inject reasoning behavior into Qwen3.5-9B and strengthen its performance on complex analytical tasks requiring decomposition and multi-step inference. - **Method:** The model is trained on distilled reasoning traces collected from stronger teacher-style reasoning sources, with the goal of transferring cleaner analytical structure, stronger planning habits, and more stable task-solving behavior. - **Target Behavior:** Compared with a standard instruct model, the tuned model is expected to respond with more deliberate reasoning organization, reduced shallow guessing, and stronger cross-domain analytical consistency. ### ๐Ÿ“š All Datasets Used The dataset consists of multiple reasoning distillation sources: | Dataset Name | Description / Purpose | |--------------|-----------------------| | [Roman1111111/gemini-3.1-pro-hard-high-reasoning](https://huggingface.co/datasets/Roman1111111/gemini-3.1-pro-hard-high-reasoning) | Primary high-quality reasoning source used to shape structured analytical style, planning behavior, and dense CoT patterns. | | [Jackrong/Qwen3.5-reasoning-700x](https://huggingface.co/datasets/Jackrong/Qwen3.5-reasoning-700x) | Provides additional Qwen-family reasoning trajectories distilled from Qwen3.5-27B, improving style stability and complementary reasoning diversity. | | [Roman1111111/gemini-3-pro-10000x-hard-high-reasoning](https://huggingface.co/datasets/Roman1111111/gemini-3-pro-10000x-hard-high-reasoning) | A broader multi-domain reasoning corpus used to enhance coverage across mathematics, systems, science, law, medicine, finance, and adversarial reasoning tasks. | ### ๐Ÿ“Š Approximate Domain Composition (Approx|Samples|Share) | Domain | Samples | Share | |--------------|--------:|------:| | Mathematics / Logic | 3947 | 28.5% | | Computer Science / Programming / Systems | 3019 | 21.8% | | Security / Adversarial Reasoning | 1551 | 11.2% | | Physics / Astronomy / Engineering | 1482 | 10.7% | | Law / Philosophy / Humanities | 1191 | 8.6% | | Biology / Medicine | 817 | 5.9% | | Finance / Economics | 679 | 4.9% | | Chemistry / Materials | 540 | 3.9% | | Applied / Social Systems (Urban Planning, Traffic, Supply Chain, etc.) | 360 | 2.6% | | Other | 264 | 1.9% | โš ๏ธ **Distillation & Task-Specific Fine-Tuning Effects:** This model has been further distilled and fine-tuned on top of the base model for reasoning-oriented tasks. While these techniques may improve performance on certain specialized tasks, they may also affect the modelโ€™s generalization ability in broader scenarios and can potentially lead to partial forgetting of some pretraining knowledge. The extent of these effects depends in part on the quality, scale, and distribution of the training datasets used during distillation and fine-tuning. As a result, the modelโ€™s behavior may differ from the base model across different tasks or application contexts. Users are encouraged to evaluate the model according to their specific requirements before deployment. ## ๐ŸŒŸ Core Skills & Capabilities 1. **Structured Analytical Reasoning:** The model is optimized to first identify the real task structure before generating an answer, rather than relying on shallow immediate completion. 2. **Improved Multi-Step Planning:** It performs more reliably on tasks requiring decomposition, constraint tracking, sequential planning, and trade-off analysis. 3. **Cross-Domain Reasoning Strength:** The training corpus provides broad reasoning coverage across math, programming, systems, physics, law, medicine, finance, chemistry, and applied domains. 4. **Security & Adversarial Awareness:** A dedicated portion of the distilled data includes adversarial, attack-defense, and failure-mode reasoning tasks, improving robustness in difficult prompts. 5. **Compact but Strong Footprint:** Built on a 9B base, the model aims to deliver significantly denser reasoning behavior and cleaner analytical output than a generic base instruct model of similar size. ## โš ๏ธ Limitations & Intended Use - **Hallucination Risk:** Although reasoning behavior is improved, the model remains an autoregressive LLM and may still hallucinate niche facts, citations, or unverifiable real-world details. - **Reasoning Style Bias:** Because the model is tuned for analytical depth, it may sometimes produce longer or more structured answers than necessary for very simple prompts. - **Teacher-Style Distillation Bias:** Some response behaviors reflect the reasoning style of the teacher traces used during distillation, rather than purely native behavior emerging from the base model itself. - **Preview Version Notice:** As a relatively specialized distilled reasoning model, surrounding inference templates, prompt formatting strategies, and ecosystem integrations may still require tuning. Users may encounter occasional compatibility differences depending on runtime or deployment stack. ## ๐Ÿ™ Acknowledgements Special thanks to the **Qwen** team for the strong base architecture, and to the broader open-source ecosystem for enabling efficient reasoning distillation workflows. We also acknowledge the value of the distilled reasoning corpora derived from **Gemini 3.1 Pro**, **Qwen3.5-27B**, and **Gemini 3 Pro**, which made this model possible.