YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

πŸ“˜ My Transformer Language Model

This is a small Transformer-based language model trained from scratch on the Wikitext-2 dataset. It was developed as part of an educational project to understand transformer architectures and language modeling. The model learns to predict the next token in a sequence using an autoregressive approach.

πŸ“¦ Model Details

  • Architecture: Decoder-only Transformer (GPT-like)
  • Layers: 6
  • Attention Heads: 8
  • Hidden Size (Embedding Dimension): 512
  • Block Size (Context Window): 128 tokens
  • Dropout: 0.1
  • Vocabulary Size: 50257 (from GPT-2 tokenizer)
  • Total Parameters: ~32M Estimated for 6 layers of self-attention and feedforward networks using 512-dimensional embeddings and 8 heads.

πŸ”€ Tokenizer

  • Type: AutoTokenizer from Transformers
  • Pretrained Model: gpt2
  • Padding Token: set to eos_token
  • Tokenization Method: Byte-level BPE (as used in GPT-2)

πŸ“Š Training Configuration

  • Dataset: WikiText-2-raw-v1
  • Epochs: 5
  • Batch Size: 32
  • Optimizer: AdamW
  • Learning Rate: 1e-4
  • Scheduler: Cosine Annealing
  • Loss Function: CrossEntropyLoss (ignoring padding token)

Trained using Google Colab with GPU acceleration.

πŸ§ͺ Evaluation

The model was evaluated on the validation split of Wikitext-2. Validation loss and training loss are tracked via Weights & Biases.

πŸ“Ž Files Included

  • pytorch_model.bin: Trained model weights.
  • config.json: Training configuration.
  • model.py: Script with the model's description.
  • tokenizer_config.json, special_tokens_map.json, vocab.json, merges.txt, tokenizer.json: Tokenizer.
  • README.md: This model card.
  • (to be added): Script to generate text using the model (generate.py - optional, to be added).

πŸ’‘ How to Use

Here’s an example of how to load and use the model for generation:

import torch
from model import TransformerLM  # model class
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# Load model (adjust path if using from HF Hub or locally)
model = TransformerLM(
    vocab_size=tokenizer.vocab_size,
    embed_dim=512,
    n_heads=8,
    n_layers=6,
    block_size=128,
    dropout=0.1
)
model.load_state_dict(torch.load("path_to_model.pth"))
model.eval()

# Generate text
prompt = "our future"
output = generate(model, tokenizer, prompt)
print(output)

Note: The generate function is custom, not from Transformers API.

πŸ”’ Limitations

  • Not suitable for production or safety-critical applications.
  • Trained on a small subset of text data.
  • Not aligned or filtered for offensive content.

🧠 Intended Use This model is for educational and research purposes. It is not fine-tuned for production tasks such as conversation, summarization, or question answering.

✍️ Author Developed by: [Tetiana Sydorenko]

Project GitHub: (to be added)

🀝 License MIT License

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support