gptnano_5M_vanilla_full_20250813_211139

A 5M parameter character-level GPT model trained using the vanilla_full approach.

Model Details

  • Architecture: GPTNanoLM (Character-level GPT)
  • Parameters: 5M (4,886,016 total)
  • Training Category: vanilla_full
  • Context Length: 512 characters
  • Vocabulary: 80 characters
  • Best Validation Loss: 0.5571

Training Configuration

  • Dataset: stanpony/tinystories-char-clean-az09-punct
  • Batch Size: 128
  • Learning Rate: 0.001
  • Architecture: 6 layers, 8 heads, 256d embeddings

Files

  • model.safetensors: Model weights (best checkpoint)
  • config.json: Model configuration
  • training_details.txt: Complete training information and usage instructions
  • tokenizer_chars.txt: Character vocabulary
  • README.md: This model card

Quick Start

⚠️ See training_details.txt for complete usage instructions and code examples.

This model uses character-level tokenization and requires the GPTNanoLM class and CharTokenizer from the original training code.

Performance

  • Best Validation Loss: 0.5571
  • Training Category: vanilla_full

For detailed training metrics, hyperparameters, and usage examples, see the training_details.txt file.

Downloads last month
4
Safetensors
Model size
17.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train stanpony/gptnano_5M_vanilla_full_20250813_211139