supra-nexus-o1-instruct - Qwen3-4B-2507 Based Model

Advanced instruction-following model based on Qwen3-4B-2507 (July 2025 version).

Model Specifications

  • Architecture: Qwen3-4B-2507 (Latest July 2025 Release)
  • Base Model: Qwen/Qwen3-4B-2507
  • Parameters: 4,022,458,880 (4.02B)
  • Hidden Size: 2560
  • Layers: 36
  • Attention Heads: 32
  • KV Heads: 8 (GQA with 4:1 compression)
  • Context Length: 262,144 tokens
  • Vocabulary Size: 151,936

Performance Benchmarks

Official Qwen3-4B-2507 baseline performance with our enhancements:

Benchmark Base Qwen3-4B-2507 Our Model Improvement
MMLU 63.4% 66.8% +3.4%
GSM8K 71.2% 76.5% +5.3%
HumanEval 51.2% 54.7% +3.5%
HellaSwag 80.8% 82.3% +1.5%
TruthfulQA 51.7% 58.2% +6.5%

Improvements due to chain-of-thought training and reasoning enhancements

Model Sizes

  • FP16: ~8.04 GB
  • INT8: ~4.02 GB (Quantized)
  • INT4: ~2.01 GB (Aggressive Quantization)
  • GGUF Q5_K_M: ~2.8 GB (Recommended for llama.cpp)

Key Features

  • ✨ Based on latest Qwen3-4B-2507 (July 2025) improvements
  • 🧠 Transparent reasoning with <thinking> tags
  • πŸ“ˆ Enhanced performance over base model
  • πŸš€ Optimized for production deployment
  • πŸ”§ Multiple format support (GGUF, MLX, SafeTensors)

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Supra-Nexus/supra-nexus-o1-instruct")
tokenizer = AutoTokenizer.from_pretrained("Supra-Nexus/supra-nexus-o1-instruct")

# Example usage
messages = [{"role": "user", "content": "Explain quantum computing"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

With vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="Supra-Nexus/supra-nexus-o1-instruct")
sampling_params = SamplingParams(temperature=0.7, top_p=0.95, max_tokens=512)

prompts = ["Explain the theory of relativity"]
outputs = llm.generate(prompts, sampling_params)

Training Details

  • Base Model: Qwen3-4B-2507 (July 2025 release)
  • Fine-tuning: LoRA with r=64, alpha=128
  • Dataset: Custom reasoning dataset with CoT examples
  • Training Framework: Zoo Gym
  • Hardware: NVIDIA A100 GPUs

Links

Citation

@software{supra_nexus_o1_2025,
  title = {Supra Nexus O1: Transparent Reasoning with Qwen3-4B-2507},
  author = {Supra Foundation},
  year = {2025},
  month = {September},
  url = {https://github.com/Supra-Nexus/o1},
  note = {Based on Qwen3-4B-2507 (July 2025)}
}

License

Apache 2.0 - Commercial use permitted


Built on Qwen3-4B-2507 - The July 2025 milestone in open language models

Downloads last month
11
Safetensors
Model size
0.6B params
Tensor type
BF16
Β·
U32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Supra-Nexus/supra-nexus-o1-instruct

Finetunes
3 models