PolyPythias
This model is part of the PolyPythias suite, an extension of the Pythia project providing 45 additional training runs across 5 model sizes with 9 different random seeds each. These models enable systematic study of training stability and reproducibility in language models.
Paper
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
Oskar van der Wal, Pietro Lesci, Max Muller-Eberstein, Naomi Saphra, Hailey Schoelkopf, Willem Zuidema, and Stella Biderman. ICLR 2025.
Model Details
| Size | Parameters | Layers | Model Dim | Heads | Original Model |
|---|---|---|---|---|---|
| 14M | 14M | 6 | 128 | 4 | pythia-14m |
| 31M | 31M | 6 | 256 | 8 | pythia-31m |
| 70M | 70M | 6 | 512 | 8 | pythia-70m |
| 160M | 160M | 12 | 768 | 12 | pythia-160m |
| 410M | 410M | 24 | 1024 | 16 | pythia-410m |
All models were trained on 300B tokens from The Pile.
Naming Convention
pythia-{size}m- Original Pythia model (seed 1234)pythia-{size}m-seed{1-9}- PolyPythias variants with different random seedspythia-160m-data-seed{1-3}- 160M models with only data ordering varied (weight init fixed)pythia-160m-weight-seed{1-3}- 160M models with only weight initialization varied (data order fixed)
The decoupled seed variants (data-seed and weight-seed) allow researchers to separately study the effects of data ordering vs. weight initialization.
Quick Start
from transformers import GPTNeoXForCausalLM, AutoTokenizer
# Load the final checkpoint
model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/pythia-70m-seed3")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-70m-seed3")
# Generate text
inputs = tokenizer("The quick brown fox", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0]))
Available Checkpoints
Each model provides 154 intermediate checkpoints saved as Git branches:
| Checkpoint | Training Tokens | Description |
|---|---|---|
step0 |
0 | Initialization (before training) |
step1, step2, step4, ..., step512 |
2M - 1B | 10 log-spaced early checkpoints |
step1000, step2000, ..., step143000 |
2B - 300B | 143 evenly-spaced checkpoints |
To load a specific checkpoint:
model = GPTNeoXForCausalLM.from_pretrained(
"EleutherAI/pythia-70m-seed3",
revision="step50000", # Any checkpoint step
)
Training Data
All models were trained on The Pile using pre-shuffled data orderings. The shuffled index files for each seed are available at:
EleutherAI/pile-preshuffled-seeds
This dataset contains .idx files for seeds 0-9 used with MMapIndexedDataset to load the memory-mapped Pile data in the correct order for each seed.
Reproducing Training Data Order
To reproduce the exact data ordering used for a specific seed:
- Download the Pile dataset and tokenize it using the Pythia tokenizer
- Download the corresponding seed folder from
pile-preshuffled-seeds:# Using huggingface_hub from huggingface_hub import snapshot_download snapshot_download( repo_id="EleutherAI/pile-preshuffled-seeds", repo_type="dataset", allow_patterns="seed3/*", # Download only seed3 local_dir="./pile-seeds" ) - Use the idx files with GPT-NeoX's
MMapIndexedDataset:from dataset import MMapIndexedDataset dataset = MMapIndexedDataset(path_prefix, skip_warmup=True)
For complete training reproduction instructions, see the Pythia GitHub repository.
All PolyPythias Models
The complete collection is available at: EleutherAI/polypythias
14M Parameter Models
- pythia-14m-seed1 through pythia-14m-seed9
31M Parameter Models
- pythia-31m-seed1 through pythia-31m-seed9
70M Parameter Models
- pythia-70m-seed1 through pythia-70m-seed9
160M Parameter Models
- pythia-160m-seed1 through pythia-160m-seed9
- pythia-160m-data-seed1 through pythia-160m-data-seed3
- pythia-160m-weight-seed1 through pythia-160m-weight-seed3
410M Parameter Models
- pythia-410m-seed1 through pythia-410m-seed9
Evaluation Results
Evaluation results for all models are available in the polypythias-evals dataset.
Limitations
These models are released for research purposes only. They are not intended for deployment in production systems.
- Not instruction-tuned: These are base language models that predict the next token; they will not follow instructions like ChatGPT
- May generate harmful content: The Pile contains diverse internet text that includes biased, offensive, and factually incorrect content
- English only: Models were trained primarily on English text
- No safety filtering: Outputs are not filtered for safety or accuracy
License
Apache 2.0
Contact
For questions about these models, please use:
- EleutherAI Discord - #release-discussion channel
- GitHub Issues
Citation
If you use these models, please cite:
@inproceedings{vanderwal2025polypythias,
title={PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs},
author={van der Wal, Oskar and Lesci, Pietro and Muller-Eberstein, Max and Saphra, Naomi and Schoelkopf, Hailey and Zuidema, Willem and Biderman, Stella},
booktitle={International Conference on Learning Representations},
year={2025},
url={https://arxiv.org/abs/2503.09543}
}
- Downloads last month
- 105