YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

📘 My Transformer Language Model

This is a small Transformer-based language model trained from scratch on the Wikitext-2 dataset. It was developed as part of an educational project to understand transformer architectures and language modeling. The model learns to predict the next token in a sequence using an autoregressive approach.

📦 Model Details

Architecture: Decoder-only Transformer (GPT-like)
Layers: 6
Attention Heads: 8
Hidden Size (Embedding Dimension): 512
Block Size (Context Window): 128 tokens
Dropout: 0.1
Vocabulary Size: 50257 (from GPT-2 tokenizer)
Total Parameters: ~32M Estimated for 6 layers of self-attention and feedforward networks using 512-dimensional embeddings and 8 heads.

🔤 Tokenizer

Type: AutoTokenizer from Transformers
Pretrained Model: gpt2
Padding Token: set to eos_token
Tokenization Method: Byte-level BPE (as used in GPT-2)

📊 Training Configuration

Dataset: WikiText-2-raw-v1
Epochs: 5
Batch Size: 32
Optimizer: AdamW
Learning Rate: 1e-4
Scheduler: Cosine Annealing
Loss Function: CrossEntropyLoss (ignoring padding token)

Trained using Google Colab with GPU acceleration.

🧪 Evaluation

The model was evaluated on the validation split of Wikitext-2. Validation loss and training loss are tracked via Weights & Biases.

📎 Files Included

pytorch_model.bin: Trained model weights.
config.json: Training configuration.
model.py: Script with the model's description.
tokenizer_config.json, special_tokens_map.json, vocab.json, merges.txt, tokenizer.json: Tokenizer.
README.md: This model card.
(to be added): Script to generate text using the model (generate.py - optional, to be added).

💡 How to Use

Here’s an example of how to load and use the model for generation:

import torch
from model import TransformerLM  # model class
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# Load model (adjust path if using from HF Hub or locally)
model = TransformerLM(
    vocab_size=tokenizer.vocab_size,
    embed_dim=512,
    n_heads=8,
    n_layers=6,
    block_size=128,
    dropout=0.1
)
model.load_state_dict(torch.load("path_to_model.pth"))
model.eval()

# Generate text
prompt = "our future"
output = generate(model, tokenizer, prompt)
print(output)

Note: The generate function is custom, not from Transformers API.

🔒 Limitations

Not suitable for production or safety-critical applications.
Trained on a small subset of text data.
Not aligned or filtered for offensive content.

🧠 Intended Use This model is for educational and research purposes. It is not fine-tuned for production tasks such as conversation, summarization, or question answering.

✍️ Author Developed by: [Tetiana Sydorenko]

Project GitHub: (to be added)

🤝 License MIT License

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support