gptnano_5M_vanilla_full_20250813_211139
A 5M parameter character-level GPT model trained using the vanilla_full approach.
Model Details
- Architecture: GPTNanoLM (Character-level GPT)
- Parameters: 5M (4,886,016 total)
- Training Category: vanilla_full
- Context Length: 512 characters
- Vocabulary: 80 characters
- Best Validation Loss: 0.5571
Training Configuration
- Dataset: stanpony/tinystories-char-clean-az09-punct
- Batch Size: 128
- Learning Rate: 0.001
- Architecture: 6 layers, 8 heads, 256d embeddings
Files
model.safetensors: Model weights (best checkpoint)config.json: Model configurationtraining_details.txt: Complete training information and usage instructionstokenizer_chars.txt: Character vocabularyREADME.md: This model card
Quick Start
⚠️ See training_details.txt for complete usage instructions and code examples.
This model uses character-level tokenization and requires the GPTNanoLM class and CharTokenizer from the original training code.
Performance
- Best Validation Loss: 0.5571
- Training Category: vanilla_full
For detailed training metrics, hyperparameters, and usage examples, see the training_details.txt file.
- Downloads last month
- 4