Nanochat d24 Model
This is a nanochat model trained on 4x RTX 3090 GPUs (24GB each).
Model Details
- Model Type: Transformer-based Language Model
- Architecture: GPT-style decoder-only transformer
- Parameters: 1233.1M (approximately)
- Layers: 24
- Embedding Dimension: 1536
- Attention Heads: 12
- Vocabulary Size: 65536
- Sequence Length: 2048
Training Details
- Training Step: 700
- Validation BPB: N/A
- Hardware: 4x NVIDIA RTX 3090 (24GB VRAM each)
- Training Framework: PyTorch with Distributed Data Parallel
Usage
This model can be used with the nanochat codebase:
from nanochat.checkpoint_manager import load_model
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model, tokenizer, meta = load_model("sft", device, phase="eval", step=700)
Citation
If you use this model, please cite:
@misc{
author = {Andrej Karpathy},
title = {nanochat: The best ChatGPT that $100 can buy},
year = {2025},
publisher = {GitHub},
url = {https://github.com/karpathy/nanochat}
}
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support