π My Transformer Language Model
This is a small Transformer-based language model trained from scratch on the Wikitext-2 dataset. It was developed as part of an educational project to understand transformer architectures and language modeling. The model learns to predict the next token in a sequence using an autoregressive approach.
π¦ Model Details
- Architecture: Decoder-only Transformer (GPT-like)
- Layers: 6
- Attention Heads: 8
- Hidden Size (Embedding Dimension): 512
- Block Size (Context Window): 128 tokens
- Dropout: 0.1
- Vocabulary Size: 50257 (from GPT-2 tokenizer)
- Total Parameters: ~32M Estimated for 6 layers of self-attention and feedforward networks using 512-dimensional embeddings and 8 heads.
π€ Tokenizer
- Type:
AutoTokenizerfrom Transformers - Pretrained Model:
gpt2 - Padding Token: set to
eos_token - Tokenization Method: Byte-level BPE (as used in GPT-2)
π Training Configuration
- Dataset: WikiText-2-raw-v1
- Epochs: 5
- Batch Size: 32
- Optimizer: AdamW
- Learning Rate: 1e-4
- Scheduler: Cosine Annealing
- Loss Function: CrossEntropyLoss (ignoring padding token)
Trained using Google Colab with GPU acceleration.
π§ͺ Evaluation
The model was evaluated on the validation split of Wikitext-2. Validation loss and training loss are tracked via Weights & Biases.
π Files Included
pytorch_model.bin: Trained model weights.config.json: Training configuration.model.py: Script with the model's description.tokenizer_config.json,special_tokens_map.json,vocab.json,merges.txt,tokenizer.json: Tokenizer.README.md: This model card.- (to be added): Script to generate text using the model (
generate.py- optional, to be added).
π‘ How to Use
Hereβs an example of how to load and use the model for generation:
import torch
from model import TransformerLM # model class
from transformers import AutoTokenizer
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
# Load model (adjust path if using from HF Hub or locally)
model = TransformerLM(
vocab_size=tokenizer.vocab_size,
embed_dim=512,
n_heads=8,
n_layers=6,
block_size=128,
dropout=0.1
)
model.load_state_dict(torch.load("path_to_model.pth"))
model.eval()
# Generate text
prompt = "our future"
output = generate(model, tokenizer, prompt)
print(output)
Note: The generate function is custom, not from Transformers API.
π Limitations
- Not suitable for production or safety-critical applications.
- Trained on a small subset of text data.
- Not aligned or filtered for offensive content.
π§ Intended Use This model is for educational and research purposes. It is not fine-tuned for production tasks such as conversation, summarization, or question answering.
βοΈ Author Developed by: [Tetiana Sydorenko]
Project GitHub: (to be added)
π€ License MIT License
- Downloads last month
- 3