gpt-oss-20b

An Open-Source Implementation and Replication of gpt-oss-20b

💻 GitHub Repository (The Code)


An Open-Source Model, For Real.

Recently, OpenAI released its gpt-oss-20b model, providing the community with powerful open-weight models. While the weights are available, the code to train such a model from scratch—the truly open-source part—was not.

This project fills that gap.

Our goal was to create a complete, clean, and high-performance codebase that replicates the gpt-oss-20b architecture and allows anyone to train a similar model from the ground up. The code is the core contribution, built to empower researchers, developers, and enthusiasts.

>> Explore the full, open-source training and inference code on GitHub <<

This model card hosts the weights for a 20B parameter model trained using our open-source codebase. It serves as a proof-of-concept that our implementation is correct, stable, and capable of training models at scale.

This model is a checkpoint from a very early stage of training—only 1900 iterations—and serves as a proof-of-concept. It was trained on the TinyStories dataset to validate our code on a powerful 5x H100 GPU setup. While it is learning to generate simple stories, its capabilities are limited due to the short training duration. Its primary purpose is to demonstrate that our open-source code works.


Model Details

  • Architecture: A 20B parameter Transformer implementing the key features of gpt-oss, including:
    • Mixture-of-Experts (MoE) with 32 experts
    • Grouped-Query Attention (GQA)
    • Sliding Window Attention
    • Rotary Position Embeddings (RoPE) with YaRN-style scaling
  • Training Data: Trained on the roneneldan/TinyStories dataset, comprising approximately 490 million tokens of simple, narrative text.
  • Training Procedure: Trained from scratch on 5x NVIDIA H100 GPUs for only 1900 iterations using our custom PyTorch FSDP (Fully Sharded Data Parallel) framework. This represents less than 10% of a single epoch, meaning the model is still in the very early phases of learning.

How to Use with transformers

You can easily use this model with the standard transformers library.

First, install the necessary libraries:

pip install transformers torch

Then, run the model using the following snippet:

from transformers import pipeline
import torch

# Use our community-trained model ID
model_id = "omunaman/Open_Source_GPT_OSS_20B"

pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16, # Use bfloat16 for H100/A100, or float16 for other GPUs
    device_map="auto",
)

# The model excels at continuing story prompts
prompt = "Once upon a time, in a land full of candy mountains, lived a little dragon named Sparky. One day, Sparky decided he wanted to"

outputs = pipe(
    prompt,
    max_new_tokens=150,
    temperature=0.8,
    top_k=50,
    do_sample=True,
)

print(outputs[0]["generated_text"])

Note: Remember, the output will be in the style of a children's story. Its quality and coherence reflect the very short training run (1900 iterations), so think of it as a promising starting point!


The True Open-Source Contribution: The Code

The most valuable part of this project is the codebase, which is designed to be a reference for large-scale training.

Explore the Repository

Our repository includes:

  • train.py: A robust, FSDP-based training script featuring memory-efficient meta device initialization and scalable sharded checkpointing.
  • model.py: A clean implementation of the gpt-oss architecture with MoE, GQA, and more.
  • sample.py: A deadlock-free, FSDP-aware script for multi-GPU inference.
  • export_to_safetensors.py: A utility to convert internal FSDP checkpoints into the Hugging Face safetensors format for easy sharing.
  • prepare.py: A simple script for tokenizing datasets into the required format.

We encourage developers to explore the repository, learn from the implementation, and use it to train their own powerful models.


License

The code in the repository and the weights of this model are released under the permissive Apache 2.0 license.


Citation

If you use our codebase or find this work helpful in your research, please consider citing our repository:

@software{Vizuara_GPT-OSS_Replication_2025,
  author = {Naman and Dr. Raj Dandekar,
  title = {{An Open-Source Implementation of gpt-oss-20b}},
  month = {September},
  year = {2025},
  url = {https://github.com/OmuNaman/gpt-oss}
}
Downloads last month
13
Safetensors
Model size
21B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train omunaman/Open_Source_GPT_OSS_20B