Nanochat d24 Model

This is a nanochat model trained on 4x RTX 3090 GPUs (24GB each).

Model Details

Model Type: Transformer-based Language Model
Architecture: GPT-style decoder-only transformer
Parameters: 1233.1M (approximately)
Layers: 24
Embedding Dimension: 1536
Attention Heads: 12
Vocabulary Size: 65536
Sequence Length: 2048

Training Details

Training Step: 700
Validation BPB: N/A
Hardware: 4x NVIDIA RTX 3090 (24GB VRAM each)
Training Framework: PyTorch with Distributed Data Parallel

Usage

This model can be used with the nanochat codebase:

from nanochat.checkpoint_manager import load_model
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model, tokenizer, meta = load_model("sft", device, phase="eval", step=700)

Citation

If you use this model, please cite:

@misc{
  author = {Andrej Karpathy},
  title = {nanochat: The best ChatGPT that $100 can buy},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/karpathy/nanochat}
}

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support