Nanochat d24 Model

This is a nanochat model trained on 4x RTX 3090 GPUs (24GB each).

Model Details

  • Model Type: Transformer-based Language Model
  • Architecture: GPT-style decoder-only transformer
  • Parameters: 1233.1M (approximately)
  • Layers: 24
  • Embedding Dimension: 1536
  • Attention Heads: 12
  • Vocabulary Size: 65536
  • Sequence Length: 2048

Training Details

  • Training Step: 700
  • Validation BPB: N/A
  • Hardware: 4x NVIDIA RTX 3090 (24GB VRAM each)
  • Training Framework: PyTorch with Distributed Data Parallel

Usage

This model can be used with the nanochat codebase:

from nanochat.checkpoint_manager import load_model
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model, tokenizer, meta = load_model("sft", device, phase="eval", step=700)

Citation

If you use this model, please cite:

@misc{
  author = {Andrej Karpathy},
  title = {nanochat: The best ChatGPT that $100 can buy},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/karpathy/nanochat}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support