File size: 2,524 Bytes
758903e 562fe49 4859d08 758903e 713d1c9 758903e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
license: mit
datasets:
- HuggingFaceFW/fineweb
language:
- en
pipeline_tag: text-generation
widget:
- text: He is a doctor. His main goal is
example_title: ' to help people.'
- text: My name is Merve and my favorite
example_title: activity is reading.
library_name: transformers
new_version: k050506koch/GPT3-dev-125m-1202
---
# GPT3
Welcome to the GPT3 repository! This project is an attempt to recreate the architecture and approach from the original OpenAI GPT-3 paper. The repository includes scripts for training, fine-tuning, and inference of a GPT-3-like model using PyTorch and the Hugging Face Transformers library.
Here are located weights of dev checkpoints of my models. You can always download a folder, paste it's path inside inference.py and chat with them.
# **You can find all code on [GitHub](https://github.com/krll-corp/GPT3)**
# Note: This is a model with 125 million parameters. It was trained on 3.6Bn tokens. (Of course, it's very undertrained, but this one should be a technology demonstrator.)
# Note 2: This is a model checkpoint released on 06/12 2024 and has been trained for longer (12 batch size, 4 grad accumulation, 512 tokens and 600,000 steps). It scores 27.65% on MMLU which is slightly higher than 25% (random guess)
## inference:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('k050506koch/GPT3-dev-125m-0612', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('k050506koch/GPT3-dev-125m-0612')
tokenizer.pad_token_id = tokenizer.eos_token_id
print("\n", tokenizer.decode(model.generate(tokenizer.encode("He is a doctor. His main goal is", return_tensors='pt'),
max_length=128, temperature=0.7, top_p=0.9, repetition_penalty=1.2, no_repeat_ngram_size=3,
num_return_sequences=1, do_sample=True)[0], skip_special_tokens=True))
```
## Contributing
Contributions are welcome! I'm just a student who is interested in AI so my code may be incorrect or have logical issues. Please open an issue or submit a pull request for any improvements or bug fixes, I will be happy.
## License
This project is licensed under the MIT License. See the LICENSE file for details. Everyone can use and modify this code at their discretion.
## Acknowledgements
Thanks OpenAI, HuggingFace and Pytorch for making this project possible!
- [OpenAI GPT-3 Paper](https://arxiv.org/abs/2005.14165)
- [Hugging Face Transformers](https://github.com/huggingface/transformers)
- [PyTorch](https://pytorch.org/) |