stanpony
/

gptnano_5M_vanilla_full_20250813_211139

Text Generation

character-level

Model card Files Files and versions

gptnano_5M_vanilla_full_20250813_211139

A 5M parameter character-level GPT model trained using the vanilla_full approach.

Model Details

Architecture: GPTNanoLM (Character-level GPT)
Parameters: 5M (4,886,016 total)
Training Category: vanilla_full
Context Length: 512 characters
Vocabulary: 80 characters
Best Validation Loss: 0.5571

Training Configuration

Dataset: stanpony/tinystories-char-clean-az09-punct
Batch Size: 128
Learning Rate: 0.001
Architecture: 6 layers, 8 heads, 256d embeddings

Files

model.safetensors: Model weights (best checkpoint)
config.json: Model configuration
training_details.txt: Complete training information and usage instructions
tokenizer_chars.txt: Character vocabulary
README.md: This model card

Quick Start

⚠️ See training_details.txt for complete usage instructions and code examples.

This model uses character-level tokenization and requires the GPTNanoLM class and CharTokenizer from the original training code.

Performance

Best Validation Loss: 0.5571
Training Category: vanilla_full

For detailed training metrics, hyperparameters, and usage examples, see the training_details.txt file.

Downloads last month: 4

Safetensors

Model size

17.5M params

Tensor type

F32

·

Dataset used to train stanpony/gptnano_5M_vanilla_full_20250813_211139