GPT & Prejudice

Custom GPT-style language model trained on 19th-century English literature

Model Overview

GPT & Prejudice is a decoder-only transformer language model trained from scratch on a curated corpus of 19th-century novels by ten female authors.
The model forms the foundation of the GPT & Prejudice research project, which investigates how language models encode social, moral, and cultural concepts — such as gender, class, marriage, and emotion — within their internal representations.

The project demonstrates that combining language models with Sparse Autoencoders (SAEs) provides a scalable interpretability framework capable of revealing conceptual structures embedded in literary corpora.

This model was developed as part of the research project
“GPT & Prejudice: Using Sparse Autoencoders to Uncover Conceptual Structures in 19th-Century Literature.”
(under peer review)

For the full experimental framework including all training scripts, analysis notebooks, and concept-probing pipeline see the companion GitHub repository:
🔗 github.com/iug-htw/GPTAndPrejudice

🧠 Model Details

Property	Value
Architecture	GPT-style decoder-only transformer
Parameters	~124M
Layers	8
Attention heads	14
Embedding dimension	896
Context length	256 tokens
Vocabulary size	50,257 (GPT-2 tokenizer)
Dropout	0.2
Optimizer	AdamW (lr = 2e-4, wd = 0.03)
Activation	GELU
Tokenizer	GPT-2 (`tiktoken.get_encoding("gpt2")`)

Training was conducted on an HPC cluster using nodes equipped with 8 × NVIDIA A100 GPUs via SLURM.
Individual training runs did not exceed two hours.

📚 Training Corpus

The model was trained on a 7.6 M-token literary corpus comprising 37 novels by ten female authors active between 1778 and 1880.
All texts were sourced from Project Gutenberg and cleaned by removing paratext (prefaces, footnotes, metadata) and standardizing spelling and encoding. The accumlated text served as the basis for training with a 90:10 train-validation split.

Author	Notable Works
Jane Austen	Pride and Prejudice, Sense and Sensibility, Emma, Mansfield Park, Persuasion
Anne Brontë	Agnes Grey, The Tenant of Wildfell Hall
Charlotte Brontë	Jane Eyre, Villette, Shirley, The Professor
Emily Brontë	Wuthering Heights
Elizabeth Gaskell	North and South, Mary Barton, Wives and Daughters
Frances Burney	Evelina, Cecilia, Camilla, The Wanderer
George Eliot	Middlemarch, The Mill on the Floss
Maria Edgeworth	Belinda, Patronage, Castle Rackrent, Helen
Mary Shelley	Lodore
Susan Ferrier	Marriage

📈 Intended Use

This model is primarily intended for research and interpretability studies, specifically:

Analysis of concept formation within LLM hidden states
Training and probing of Sparse Autoencoders (SAEs)
Exploration of bias and cultural encoding in historical corpora
Controlled generation experiments in 19th-century literary style

⚠️ The model is not optimized for modern text generation or general-purpose NLP tasks.

💬 Example Usage

💡 For an interactive example, see demo_huggingface_model.ipynb,
which walks through loading the model from Hugging Face and generating sample completions.

⚙️ Files and Configuration

File	Description
`model_896_14_8_256.pth` / `model.safetensors`	Trained model weights
`config.json`	Model configuration
`tokenizer.json`, `tokenizer_config.json`, `vocab.json`, `merges.txt`	GPT-2 tokenizer files
`gpt_model.py`	Custom model implementation
`configuration_gpt_and_prejudice.py`	AutoModel configuration class for Transformers integration

For the full reproducible setup, training pipeline, and analysis notebooks, visit the companion GitHub repository: 🔗 github.com/mariamkmahran/GPTAndPrejudice

⚠️ Limitations and Ethical Considerations

The model reflects historical language from the 18th–19th centuries and may reproduce social, gender, and class biases characteristic of that period.
Generated text may contain outdated or biased expressions consistent with the source corpus.
The model is designed exclusively for interpretability research and bias analysis, not for modern generative applications.

Citation

If you use this work, please cite the archived paper:

@misc{mahran2025gptandprejudice,
      title={GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models}, 
      author={Mariam Mahran and Katharina Simbeck},
      year={2025},
      eprint={2510.01252},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.01252}, 
}

🧾 License

Model code and configuration: MIT License
Literary corpus: Public domain (Project Gutenberg)
Probing dataset: CC BY 4.0

✨ Acknowledgements

Developed at HTW Berlin (Hochschule für Technik und Wirtschaft), 2025. This work was carried out in the context of the KIWI project (16DHBKI071) that is generously funded by the Federal Ministry of Research, Technology and Space (BMFTR).

GPT & Prejudice - HTW Berlin, 2025

Downloads last month: 17

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support