GPT-3 175B Architecture Demonstrator

This repository contains an architectural demonstrator of GPT-3 (175B parameters), based directly on the architecture described in the original GPT-3 paper ("Language Models are Few-Shot Learners"). The model implementation details are available in my GitHub repository.

Important Note

This repository includes an untrained architectural demonstrator, meaning it matches the GPT-3 architecture but contains randomly initialized weights. As a result, the model does not have meaningful predictive abilities out-of-the-box. However, due to the behavior of token sampling at extremely low temperature settings (e.g., temperature=0.0001), the model may appear to generate semi-coherent text by selecting the most probable tokens, even though its outputs remain essentially random.

Training

You can train this demonstrator from scratch or fine-tune it using the scripts and configuration files provided in my GitHub repository. These files are fully compatible with this architecture demonstrator and designed to simplify the training process.

Refer to the repository for detailed instructions and best practices.

Contributions are welcome!

Downloads last month: 1

Safetensors

Model size

175B params

Tensor type

BF16