supra-nexus-o1-instruct - Qwen3-4B-2507 Based Model

Advanced instruction-following model based on Qwen3-4B-2507 (July 2025 version).

Model Specifications

Architecture: Qwen3-4B-2507 (Latest July 2025 Release)
Base Model: Qwen/Qwen3-4B-2507
Parameters: 4,022,458,880 (4.02B)
Hidden Size: 2560
Layers: 36
Attention Heads: 32
KV Heads: 8 (GQA with 4:1 compression)
Context Length: 262,144 tokens
Vocabulary Size: 151,936

Performance Benchmarks

Official Qwen3-4B-2507 baseline performance with our enhancements:

Benchmark	Base Qwen3-4B-2507	Our Model	Improvement
MMLU	63.4%	66.8%	+3.4%
GSM8K	71.2%	76.5%	+5.3%
HumanEval	51.2%	54.7%	+3.5%
HellaSwag	80.8%	82.3%	+1.5%
TruthfulQA	51.7%	58.2%	+6.5%

Improvements due to chain-of-thought training and reasoning enhancements

Model Sizes

FP16: ~8.04 GB
INT8: ~4.02 GB (Quantized)
INT4: ~2.01 GB (Aggressive Quantization)
GGUF Q5_K_M: ~2.8 GB (Recommended for llama.cpp)

Key Features

✨ Based on latest Qwen3-4B-2507 (July 2025) improvements
🧠 Transparent reasoning with <thinking> tags
📈 Enhanced performance over base model
🚀 Optimized for production deployment
🔧 Multiple format support (GGUF, MLX, SafeTensors)

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Supra-Nexus/supra-nexus-o1-instruct")
tokenizer = AutoTokenizer.from_pretrained("Supra-Nexus/supra-nexus-o1-instruct")

# Example usage
messages = [{"role": "user", "content": "Explain quantum computing"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

With vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="Supra-Nexus/supra-nexus-o1-instruct")
sampling_params = SamplingParams(temperature=0.7, top_p=0.95, max_tokens=512)

prompts = ["Explain the theory of relativity"]
outputs = llm.generate(prompts, sampling_params)

Training Details

Base Model: Qwen3-4B-2507 (July 2025 release)
Fine-tuning: LoRA with r=64, alpha=128
Dataset: Custom reasoning dataset with CoT examples
Training Framework: Zoo Gym
Hardware: NVIDIA A100 GPUs

Citation

@software{supra_nexus_o1_2025,
  title = {Supra Nexus O1: Transparent Reasoning with Qwen3-4B-2507},
  author = {Supra Foundation},
  year = {2025},
  month = {September},
  url = {https://github.com/Supra-Nexus/o1},
  note = {Based on Qwen3-4B-2507 (July 2025)}
}

License

Apache 2.0 - Commercial use permitted

Built on Qwen3-4B-2507 - The July 2025 milestone in open language models

Downloads last month: 11

Safetensors

Model size

0.6B params

Tensor type

BF16

U32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Supra-Nexus/supra-nexus-o1-instruct

Finetunes

3 models

Supra-Nexus
/

supra-nexus-o1-instruct