| # Training Details for gptnano_5M_vanilla_full_20250813_211139 | |
| ## Model Overview | |
| - **Model Name**: gptnano_5M_vanilla_full_20250813_211139 | |
| - **Architecture**: GPTNanoLM (Character-level GPT) | |
| - **Parameters**: 5M (4,886,016 total) | |
| - **Training Category**: vanilla_full | |
| - **Best Validation Loss**: 0.5571 | |
| ## Dataset Information | |
| - **Dataset**: stanpony/tinystories-char-clean-az09-punct | |
| - **Tokenizer**: Character-level (no special tokens) | |
| - **Context Length**: 512 characters | |
| - **Character Vocabulary Size**: 80 | |
| ## Model Architecture | |
| - **Embedding Dimension**: 256 | |
| - **Number of Attention Heads**: 8 | |
| - **Number of Layers**: 6 | |
| - **Block Size (Context Length)**: 512 | |
| - **Dropout**: 0.0 | |
| - **Vocabulary Size**: 80 | |
| ## Training Configuration | |
| - **Batch Size**: 128 | |
| - **Learning Rate**: 0.001 | |
| - **Optimizer**: AdamW | |
| - **Epochs**: 1 | |
| - **Device**: cuda | |
| - **Framework**: PyTorch | |
| ## Training Category Details | |
| - **Category**: vanilla_full | |
| - **Description**: vanilla_full | |
| ## Files in this Repository | |
| - `model.safetensors`: Model weights (best checkpoint) | |
| - `config.json`: Model configuration parameters | |
| - `training_details.txt`: This file with training information | |
| - `tokenizer_chars.txt`: Character vocabulary (80 chars) | |
| - `README.md`: Model card and documentation | |
| ## Training Metrics | |
| - **Best Validation Loss**: 0.5571 | |
| - **Training completed**: 2025-08-14 00:16:11 | |