--- language: en license: apache-2.0 library_name: pytorch tags: - transformer - gpt - language-model - from-scratch - educational --- # Model Card for LumenBase A 128M parameter GPT-style transformer built from scratch for educational purposes, featuring Grouped Multi-Query Attention (GQA), SwiGLU, RMSNorm, and RoPE. ## Model Details ### Model Description LumenBase is a decoder-only transformer language model implementing modern architectural optimizations: - **Architecture**: 12-layer transformer with GQA (12 query heads, 4 KV heads), SwiGLU activation, RMSNorm, and RoPE - **Parameters**: 128M (768 hidden size, 3072 FFN, 2048 context length) - **Training**: Mixed precision (FP16/BF16) with custom tokenizer (32K vocab) - **Developed by:** Hariom Jangra - **Model type:** Decoder-only Transformer - **Language:** English - **License:** MIT - **Repository:** https://github.com/HariomJangra/project-lumen ## Uses **Direct Use:** - Text generation and completion - Educational resource for understanding transformer architecture - Research baseline for language models - Foundation for fine-tuning on specific tasks **Downstream Use:** - Instruction tuning - Chat applications - Domain-specific fine-tuning **Out-of-Scope:** - Production deployments - Safety-critical applications - Applications requiring factual accuracy without verification - This is an educational model - use established frameworks for production ## Limitations **Technical:** - Limited size (128M parameters) - below state-of-the-art performance - 2048 token context window - May generate incoherent text for complex prompts **Bias & Safety:** - May perpetuate training data biases - Not evaluated for fairness across demographics - Can generate inappropriate content - Should not be relied upon for factual information **Recommendations:** This is an educational model. Verify all outputs, implement content filtering for applications, and use production-ready models for commercial use. ## Training **Data:** Custom datasets tokenized with BPE (32K vocab) **Hyperparameters:** - Optimizer: AdamW (lr=3e-4, weight_decay=0.1) - Batch: 12 × 4 gradient accumulation = 48 effective - Sequence length: 2048 tokens - Scheduler: Linear warmup + Cosine annealing - Precision: Mixed (FP16/BF16/FP32) - Dropout: 0.1 (training), 0.0 (inference) ![Training Loss](training_loss_curve.png) ## Evaluation Evaluated on standard NLP benchmarks: | Benchmark | Accuracy | Correct/Total | |-----------|----------|---------------| | **ARC-Easy** | 39.48% | 938/2,376 | | **ARC-Challenge** | 23.55% | 276/1,172 | | **HellaSwag** | 32.62% | 334/1,024 | **Summary:** Baseline performance consistent with a 128M educational model. Results show capability on easier tasks with room for improvement on complex reasoning. ## Technical Specifications **Architecture:** Decoder-only Transformer - 12 layers, 768 hidden size, 12 attention heads (4 KV heads) - SwiGLU FFN (3072 intermediate), RMSNorm, RoPE - 32K vocab, 2048 max sequence length - Weight tying between embedding and output layers **Implementation:** Custom PyTorch implementation from scratch **Software:** Python 3.13, PyTorch, NumPy, Tokenizers, tqdm, matplotlib ## How to Use ```python import torch from ModelArchitecture import Transformer, ModelConfig, generate from tokenizers import Tokenizer # Load configuration and model config = ModelConfig(vocab_size=32000, hidden_size=768, n_heads=12, n_kv_heads=4, n_kv_groups=3, head_dim=64, n_layers=12, intermediate_size=3072, max_position_embeddings=2048, dropout=0.0, pre_norm=True, tie_weights=True) model = Transformer(config) model.load_state_dict(torch.load('model.safetensors')) model.eval() # Generate text tokenizer = Tokenizer.from_file('tokenizer.json') prompt = "Once upon a time" input_ids = torch.tensor([tokenizer.encode(prompt).ids]) output = generate(model, input_ids, max_new_tokens=100, temperature=0.8, top_k=50, top_p=0.9) print(tokenizer.decode(output[0].tolist())) ``` ## Citation ```bibtex @misc{lumenbase2024, author = {Jangra, Hariom}, title = {LumenBase: A 128M Parameter Language Model Built from Scratch}, year = {2025}, publisher = {GitHub}, howpublished = {\url{https://github.com/HariomJangra/project-lumen}} } ``` ## Contact **Author:** Hariom Jangra ([@HariomJangra](https://github.com/HariomJangra)) For questions or feedback, please open an issue on the [GitHub repository](https://github.com/HariomJangra/project-lumen).