GPT & Prejudice

Custom GPT-style language model trained on 19th-century English literature

Model Overview

GPT & Prejudice is a decoder-only transformer language model trained from scratch on a curated corpus of 19th-century novels by ten female authors.
The model forms the foundation of the GPT & Prejudice research project, which investigates how language models encode social, moral, and cultural concepts — such as gender, class, marriage, and emotion — within their internal representations.

The project demonstrates that combining language models with Sparse Autoencoders (SAEs) provides a scalable interpretability framework capable of revealing conceptual structures embedded in literary corpora.

This model was developed as part of the research project
“GPT & Prejudice: Using Sparse Autoencoders to Uncover Conceptual Structures in 19th-Century Literature.”
(under peer review)

For the full experimental framework including all training scripts, analysis notebooks, and concept-probing pipeline see the companion GitHub repository:
🔗 github.com/iug-htw/GPTAndPrejudice


🧠 Model Details

Property Value
Architecture GPT-style decoder-only transformer
Parameters ~124M
Layers 8
Attention heads 14
Embedding dimension 896
Context length 256 tokens
Vocabulary size 50,257 (GPT-2 tokenizer)
Dropout 0.2
Optimizer AdamW (lr = 2e-4, wd = 0.03)
Activation GELU
Tokenizer GPT-2 (tiktoken.get_encoding("gpt2"))

Training was conducted on an HPC cluster using nodes equipped with 8 × NVIDIA A100 GPUs via SLURM.
Individual training runs did not exceed two hours.


📚 Training Corpus

The model was trained on a 7.6 M-token literary corpus comprising 37 novels by ten female authors active between 1778 and 1880.
All texts were sourced from Project Gutenberg and cleaned by removing paratext (prefaces, footnotes, metadata) and standardizing spelling and encoding. The accumlated text served as the basis for training with a 90:10 train-validation split.

Author Notable Works
Jane Austen Pride and Prejudice, Sense and Sensibility, Emma, Mansfield Park, Persuasion
Anne Brontë Agnes Grey, The Tenant of Wildfell Hall
Charlotte Brontë Jane Eyre, Villette, Shirley, The Professor
Emily Brontë Wuthering Heights
Elizabeth Gaskell North and South, Mary Barton, Wives and Daughters
Frances Burney Evelina, Cecilia, Camilla, The Wanderer
George Eliot Middlemarch, The Mill on the Floss
Maria Edgeworth Belinda, Patronage, Castle Rackrent, Helen
Mary Shelley Lodore
Susan Ferrier Marriage

📈 Intended Use

This model is primarily intended for research and interpretability studies, specifically:

  • Analysis of concept formation within LLM hidden states
  • Training and probing of Sparse Autoencoders (SAEs)
  • Exploration of bias and cultural encoding in historical corpora
  • Controlled generation experiments in 19th-century literary style

⚠️ The model is not optimized for modern text generation or general-purpose NLP tasks.


💬 Example Usage

💡 For an interactive example, see demo_huggingface_model.ipynb,
which walks through loading the model from Hugging Face and generating sample completions.


⚙️ Files and Configuration

File Description
model_896_14_8_256.pth / model.safetensors Trained model weights
config.json Model configuration
tokenizer.json, tokenizer_config.json, vocab.json, merges.txt GPT-2 tokenizer files
gpt_model.py Custom model implementation
configuration_gpt_and_prejudice.py AutoModel configuration class for Transformers integration

For the full reproducible setup, training pipeline, and analysis notebooks, visit the companion GitHub repository: 🔗 github.com/mariamkmahran/GPTAndPrejudice


⚠️ Limitations and Ethical Considerations

  • The model reflects historical language from the 18th–19th centuries and may reproduce social, gender, and class biases characteristic of that period.
  • Generated text may contain outdated or biased expressions consistent with the source corpus.
  • The model is designed exclusively for interpretability research and bias analysis, not for modern generative applications.

Citation

If you use this work, please cite the archived paper:

@misc{mahran2025gptandprejudice,
      title={GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models}, 
      author={Mariam Mahran and Katharina Simbeck},
      year={2025},
      eprint={2510.01252},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.01252}, 
}

🧾 License

  • Model code and configuration: MIT License
  • Literary corpus: Public domain (Project Gutenberg)
  • Probing dataset: CC BY 4.0

✨ Acknowledgements

Developed at HTW Berlin (Hochschule für Technik und Wirtschaft), 2025. This work was carried out in the context of the KIWI project (16DHBKI071) that is generously funded by the Federal Ministry of Research, Technology and Space (BMFTR).


GPT & Prejudice - HTW Berlin, 2025

Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support