thecr7guy
/

gpt2-pretrain

Text Generation

Model card Files Files and versions

GPT-2 from Scratch

This model implements the GPT-2 architecture (125M parameters) trained from scratch.

Model Description

Model type: GPT-2 (125M parameters)
Architecture: Transformer-based autoregressive language model following the original GPT-2 design
Training data: Uses multiple datasets (check tags) - 18Billion tokens.
Language: English

Performance and Evaluation

Dataset	Metric	thecr7guy/gpt2-pretrain	GPT-2 (baseline)
HellaSwag	acc	0.291	0.289
SciQ	acc	0.754	0.752
Winogrande	acc	0.491	0.516
TruthfulQA MC1	acc	0.236	0.228
MMLU (overall)	acc	0.230	0.229
- Humanities	acc	0.242	0.242
- Social Sci.	acc	0.217	0.217
- STEM	acc	0.213	0.213
- Other	acc	0.239	0.238

Training Details

Training corpus: Approximately 18B tokens (120GB)
Training duration: 1 epochs (approximately 8 hours total)
Hardware: 8× NVIDIA A100 PCE GPUs via runpod.io
Estimated cost: $ (8*13.52) for complete training
Token context: 1024 tokens

Hyperparameters

context_len: 1024
seed: 42
epochs: 2
batch_size: 64
total_batch_size: 524288 tokens
grad_clip: 1.0
optimizer: "adamw"
max_lr: 6.0e-4
min_lr: 6.0e-5
beta1: 0.9
beta2: 0.95
weight_decay: 0.1

.

Commands used during installation

pip install wandb
pip install tiktoken
pip install --upgrade huggingface_hub
pip install torchinfo
pip install datasets
sudo apt update && sudo apt install tmux
tmux new -s training
wandb login
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 NCCL_P2P_DISABLE=1
torchrun --standalone --nproc_per_node=8 train.py

Contact

GitHub: thecr7guy2

Downloads last month: 14

Safetensors

Model size

0.1B params

Tensor type

F32

·

Model tree for thecr7guy/gpt2-pretrain

Base model

openai-community/gpt2

Finetuned

(1910)

this model

Finetunes

1 model

Datasets used to train thecr7guy/gpt2-pretrain