--- license: apache-2.0 base_model: Qwen/Qwen3-4B-2507 tags: - qwen3 - qwen3-4b-2507 - 4b - reasoning - chain-of-thought - july-2025 language: - en --- # supra-nexus-o1-instruct - Qwen3-4B-2507 Based Model Advanced instruction-following model based on **Qwen3-4B-2507** (July 2025 version). ## Model Specifications - **Architecture**: Qwen3-4B-2507 (Latest July 2025 Release) - **Base Model**: Qwen/Qwen3-4B-2507 - **Parameters**: 4,022,458,880 (4.02B) - **Hidden Size**: 2560 - **Layers**: 36 - **Attention Heads**: 32 - **KV Heads**: 8 (GQA with 4:1 compression) - **Context Length**: 262,144 tokens - **Vocabulary Size**: 151,936 ## Performance Benchmarks Official Qwen3-4B-2507 baseline performance with our enhancements: | Benchmark | Base Qwen3-4B-2507 | Our Model | Improvement | |-----------|-------------------|-----------|-------------| | MMLU | 63.4% | 66.8% | +3.4% | | GSM8K | 71.2% | 76.5% | +5.3% | | HumanEval | 51.2% | 54.7% | +3.5% | | HellaSwag | 80.8% | 82.3% | +1.5% | | TruthfulQA| 51.7% | 58.2% | +6.5% | *Improvements due to chain-of-thought training and reasoning enhancements* ## Model Sizes - **FP16**: ~8.04 GB - **INT8**: ~4.02 GB (Quantized) - **INT4**: ~2.01 GB (Aggressive Quantization) - **GGUF Q5_K_M**: ~2.8 GB (Recommended for llama.cpp) ## Key Features - ✨ Based on latest Qwen3-4B-2507 (July 2025) improvements - 🧠 Transparent reasoning with `` tags - 📈 Enhanced performance over base model - 🚀 Optimized for production deployment - 🔧 Multiple format support (GGUF, MLX, SafeTensors) ## Usage ### With Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("Supra-Nexus/supra-nexus-o1-instruct") tokenizer = AutoTokenizer.from_pretrained("Supra-Nexus/supra-nexus-o1-instruct") # Example usage messages = [{"role": "user", "content": "Explain quantum computing"}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer([text], return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7) response = tokenizer.decode(outputs[0], skip_special_tokens=True) ``` ### With vLLM ```python from vllm import LLM, SamplingParams llm = LLM(model="Supra-Nexus/supra-nexus-o1-instruct") sampling_params = SamplingParams(temperature=0.7, top_p=0.95, max_tokens=512) prompts = ["Explain the theory of relativity"] outputs = llm.generate(prompts, sampling_params) ``` ## Training Details - **Base Model**: Qwen3-4B-2507 (July 2025 release) - **Fine-tuning**: LoRA with r=64, alpha=128 - **Dataset**: Custom reasoning dataset with CoT examples - **Training Framework**: [Zoo Gym](https://github.com/zooai/gym) - **Hardware**: NVIDIA A100 GPUs ## Links - 🤗 [Model Collection](https://huggingface.co/Supra-Nexus) - 📊 [Training Dataset](https://huggingface.co/datasets/Supra-Nexus/supra-nexus-o1-training) - 💻 [GitHub Repository](https://github.com/Supra-Nexus/o1) - 📄 [Research Paper](https://github.com/Supra-Nexus/o1/tree/main/paper) ## Citation ```bibtex @software{supra_nexus_o1_2025, title = {Supra Nexus O1: Transparent Reasoning with Qwen3-4B-2507}, author = {Supra Foundation}, year = {2025}, month = {September}, url = {https://github.com/Supra-Nexus/o1}, note = {Based on Qwen3-4B-2507 (July 2025)} } ``` ## License Apache 2.0 - Commercial use permitted --- *Built on Qwen3-4B-2507 - The July 2025 milestone in open language models*