Add Complete training details and usage instructions

de09d8e verified 4 months ago

1.36 kB

	# Training Details for gptnano_5M_vanilla_full_20250813_211139

	## Model Overview
	- Model Name: gptnano_5M_vanilla_full_20250813_211139
	- Architecture: GPTNanoLM (Character-level GPT)
	- Parameters: 5M (4,886,016 total)
	- Training Category: vanilla_full
	- Best Validation Loss: 0.5571

	## Dataset Information
	- Dataset: stanpony/tinystories-char-clean-az09-punct
	- Tokenizer: Character-level (no special tokens)
	- Context Length: 512 characters
	- Character Vocabulary Size: 80

	## Model Architecture
	- Embedding Dimension: 256
	- Number of Attention Heads: 8
	- Number of Layers: 6
	- Block Size (Context Length): 512
	- Dropout: 0.0
	- Vocabulary Size: 80

	## Training Configuration
	- Batch Size: 128
	- Learning Rate: 0.001
	- Optimizer: AdamW
	- Epochs: 1
	- Device: cuda
	- Framework: PyTorch

	## Training Category Details
	- Category: vanilla_full
	- Description: vanilla_full

	## Files in this Repository
	- `model.safetensors`: Model weights (best checkpoint)
	- `config.json`: Model configuration parameters
	- `training_details.txt`: This file with training information
	- `tokenizer_chars.txt`: Character vocabulary (80 chars)
	- `README.md`: Model card and documentation

	## Training Metrics
	- Best Validation Loss: 0.5571
	- Training completed: 2025-08-14 00:16:11

	# Training Details for gptnano_5M_vanilla_full_20250813_211139

	## Model Overview
	- Model Name: gptnano_5M_vanilla_full_20250813_211139
	- Architecture: GPTNanoLM (Character-level GPT)
	- Parameters: 5M (4,886,016 total)
	- Training Category: vanilla_full
	- Best Validation Loss: 0.5571

	## Dataset Information
	- Dataset: stanpony/tinystories-char-clean-az09-punct
	- Tokenizer: Character-level (no special tokens)
	- Context Length: 512 characters
	- Character Vocabulary Size: 80

	## Model Architecture
	- Embedding Dimension: 256
	- Number of Attention Heads: 8
	- Number of Layers: 6
	- Block Size (Context Length): 512
	- Dropout: 0.0
	- Vocabulary Size: 80

	## Training Configuration
	- Batch Size: 128
	- Learning Rate: 0.001
	- Optimizer: AdamW
	- Epochs: 1
	- Device: cuda
	- Framework: PyTorch

	## Training Category Details
	- Category: vanilla_full
	- Description: vanilla_full

	## Files in this Repository
	- `model.safetensors`: Model weights (best checkpoint)
	- `config.json`: Model configuration parameters
	- `training_details.txt`: This file with training information
	- `tokenizer_chars.txt`: Character vocabulary (80 chars)
	- `README.md`: Model card and documentation

	## Training Metrics
	- Best Validation Loss: 0.5571
	- Training completed: 2025-08-14 00:16:11