stanpony's picture
Add Complete training details and usage instructions
de09d8e verified
# Training Details for gptnano_5M_vanilla_full_20250813_211139
## Model Overview
- **Model Name**: gptnano_5M_vanilla_full_20250813_211139
- **Architecture**: GPTNanoLM (Character-level GPT)
- **Parameters**: 5M (4,886,016 total)
- **Training Category**: vanilla_full
- **Best Validation Loss**: 0.5571
## Dataset Information
- **Dataset**: stanpony/tinystories-char-clean-az09-punct
- **Tokenizer**: Character-level (no special tokens)
- **Context Length**: 512 characters
- **Character Vocabulary Size**: 80
## Model Architecture
- **Embedding Dimension**: 256
- **Number of Attention Heads**: 8
- **Number of Layers**: 6
- **Block Size (Context Length)**: 512
- **Dropout**: 0.0
- **Vocabulary Size**: 80
## Training Configuration
- **Batch Size**: 128
- **Learning Rate**: 0.001
- **Optimizer**: AdamW
- **Epochs**: 1
- **Device**: cuda
- **Framework**: PyTorch
## Training Category Details
- **Category**: vanilla_full
- **Description**: vanilla_full
## Files in this Repository
- `model.safetensors`: Model weights (best checkpoint)
- `config.json`: Model configuration parameters
- `training_details.txt`: This file with training information
- `tokenizer_chars.txt`: Character vocabulary (80 chars)
- `README.md`: Model card and documentation
## Training Metrics
- **Best Validation Loss**: 0.5571
- **Training completed**: 2025-08-14 00:16:11