GPT & Prejudice
Custom GPT-style language model trained on 19th-century English literature
Model Overview
GPT & Prejudice is a decoder-only transformer language model trained from scratch on a curated corpus of 19th-century novels by ten female authors.
The model forms the foundation of the GPT & Prejudice research project, which investigates how language models encode social, moral, and cultural concepts — such as gender, class, marriage, and emotion — within their internal representations.
The project demonstrates that combining language models with Sparse Autoencoders (SAEs) provides a scalable interpretability framework capable of revealing conceptual structures embedded in literary corpora.
This model was developed as part of the research project
“GPT & Prejudice: Using Sparse Autoencoders to Uncover Conceptual Structures in 19th-Century Literature.”
(under peer review)
For the full experimental framework including all training scripts, analysis notebooks, and concept-probing pipeline see the companion GitHub repository:
🔗 github.com/iug-htw/GPTAndPrejudice
🧠 Model Details
| Property | Value |
|---|---|
| Architecture | GPT-style decoder-only transformer |
| Parameters | ~124M |
| Layers | 8 |
| Attention heads | 14 |
| Embedding dimension | 896 |
| Context length | 256 tokens |
| Vocabulary size | 50,257 (GPT-2 tokenizer) |
| Dropout | 0.2 |
| Optimizer | AdamW (lr = 2e-4, wd = 0.03) |
| Activation | GELU |
| Tokenizer | GPT-2 (tiktoken.get_encoding("gpt2")) |
Training was conducted on an HPC cluster using nodes equipped with 8 × NVIDIA A100 GPUs via SLURM.
Individual training runs did not exceed two hours.
📚 Training Corpus
The model was trained on a 7.6 M-token literary corpus comprising 37 novels by ten female authors active between 1778 and 1880.
All texts were sourced from Project Gutenberg and cleaned by removing paratext (prefaces, footnotes, metadata) and standardizing spelling and encoding.
The accumlated text served as the basis for training with a 90:10 train-validation split.
| Author | Notable Works |
|---|---|
| Jane Austen | Pride and Prejudice, Sense and Sensibility, Emma, Mansfield Park, Persuasion |
| Anne Brontë | Agnes Grey, The Tenant of Wildfell Hall |
| Charlotte Brontë | Jane Eyre, Villette, Shirley, The Professor |
| Emily Brontë | Wuthering Heights |
| Elizabeth Gaskell | North and South, Mary Barton, Wives and Daughters |
| Frances Burney | Evelina, Cecilia, Camilla, The Wanderer |
| George Eliot | Middlemarch, The Mill on the Floss |
| Maria Edgeworth | Belinda, Patronage, Castle Rackrent, Helen |
| Mary Shelley | Lodore |
| Susan Ferrier | Marriage |
📈 Intended Use
This model is primarily intended for research and interpretability studies, specifically:
- Analysis of concept formation within LLM hidden states
- Training and probing of Sparse Autoencoders (SAEs)
- Exploration of bias and cultural encoding in historical corpora
- Controlled generation experiments in 19th-century literary style
⚠️ The model is not optimized for modern text generation or general-purpose NLP tasks.
💬 Example Usage
💡 For an interactive example, see
demo_huggingface_model.ipynb,
which walks through loading the model from Hugging Face and generating sample completions.
⚙️ Files and Configuration
| File | Description |
|---|---|
model_896_14_8_256.pth / model.safetensors |
Trained model weights |
config.json |
Model configuration |
tokenizer.json, tokenizer_config.json, vocab.json, merges.txt |
GPT-2 tokenizer files |
gpt_model.py |
Custom model implementation |
configuration_gpt_and_prejudice.py |
AutoModel configuration class for Transformers integration |
For the full reproducible setup, training pipeline, and analysis notebooks, visit the companion GitHub repository: 🔗 github.com/mariamkmahran/GPTAndPrejudice
⚠️ Limitations and Ethical Considerations
- The model reflects historical language from the 18th–19th centuries and may reproduce social, gender, and class biases characteristic of that period.
- Generated text may contain outdated or biased expressions consistent with the source corpus.
- The model is designed exclusively for interpretability research and bias analysis, not for modern generative applications.
Citation
If you use this work, please cite the archived paper:
@misc{mahran2025gptandprejudice,
title={GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models},
author={Mariam Mahran and Katharina Simbeck},
year={2025},
eprint={2510.01252},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.01252},
}
🧾 License
- Model code and configuration: MIT License
- Literary corpus: Public domain (Project Gutenberg)
- Probing dataset: CC BY 4.0
✨ Acknowledgements
Developed at HTW Berlin (Hochschule für Technik und Wirtschaft), 2025. This work was carried out in the context of the KIWI project (16DHBKI071) that is generously funded by the Federal Ministry of Research, Technology and Space (BMFTR).
GPT & Prejudice - HTW Berlin, 2025
- Downloads last month
- 17