--- license: mit language: en tags: - llm - pytorch - custom-model - causal-lm - character-level - math - tiny-model model_type: tiny-causal-llm datasets: - custom pipeline_tag: text-generation --- # TinyLLM: Character-Level Math Solver ## Model Description **TinyLLM** is a highly compact, character-level **Causal Language Model** (based on the standard Transformer decoder architecture) trained specifically to solve single-digit math problems. This model serves as a minimalist, educational example of how a standard LLM architecture can be trained from scratch on a very small, custom dataset. ### Key Features * **Architecture:** Causal Transformer Decoder. * **Task:** Character-level text generation (autoregressive). * **Input/Output:** Solves problems formatted as `N op N` and generates the answer, e.g., `4 + 5 = 9`. * **Custom Code Required:** This is a custom PyTorch model and requires custom code (`model.py`, `tokenizer.py`) to be loaded by users. --- ## How to Use (Inference) To load and run this custom model, users must download the entire repository structure and use the provided custom code, specifically the `TinyLLM` class defined in **`model.py`** and the `CharacterTokenizer` in **`tokenizer.py`**. ### 1. Installation First, ensure you have the required libraries installed: ```bash pip install torch huggingface-hub from huggingface_hub import snapshot_download import torch import os import sys # 1. Configuration: REPLACE with your repository ID MODEL_ID = "anujbhatt4ai/tiny-math-llm" DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu' # 2. Download all files (code and weights) local_path = snapshot_download(repo_id=MODEL_ID) # 3. Import Custom Classes # The downloaded path must be added to sys.path to allow custom imports sys.path.append(local_path) from model import TinyLLM from tokenizer import CharacterTokenizer, generate_v1_data # 4. Setup and Load Model def load_tiny_llm(): # In this minimal case, we hardcode the known config values vocab_size = 22 block_size = 14 # Initialize the model with the exact trained parameters model = TinyLLM( vocab_size=vocab_size, block_size=block_size, n_embed=64, n_head=4, n_layer=4, dropout=0.1 ).to(DEVICE) # Load the trained weights weights_path = os.path.join(local_path, "pytorch_model.bin") model.load_state_dict(torch.load(weights_path, map_location=DEVICE)) model.eval() # Initialize the tokenizer raw_data = generate_v1_data() tokenizer = CharacterTokenizer(raw_data) return model, tokenizer # Use the loaded model and tokenizer in your own generation logic model, tokenizer = load_tiny_llm() print("Model loaded and ready for math inference!") **Block 4: Training Details and Repository Files** `markdown ## Training Details ### Architecture Configuration The `TinyLLM` is configured with the following parameters, derived from the `config.json` and `model.py` files: | Parameter | Value | Description | | :--- | :--- | :--- | | **`vocab_size`** | `22` | The size of the character vocabulary. | | **`block_size`** | `14` | The maximum sequence length (context window). | | **`n_embed`** | `64` | Embedding dimension. | | **`n_head`** | `4` | Number of attention heads. | | **`n_layer`** | `4` | Number of Transformer decoder blocks. | | **`dropout`** | `0.1` | Dropout rate. | ### Training Hyperparameters (from `train.py`) | Parameter | Value | | :--- | :--- | | **`BATCH_SIZE`** | `32` | | **`LEARNING_RATE`** | `1e-3` (AdamW) | | **`EPOCHS`** | `100` | | **`DEVICE`** | `cuda` if available, else `cpu` | ### Dataset The model was trained on an **exhaustive set of all single-digit math problems** (addition, subtraction, multiplication, and non-remainder division) where the result is also a single digit (0-9). The **`dataset.py`** file contains the logic for the essential sequence shift used for language modeling training. --- ## Repository Files This flat repository contains all the source code needed for complete reproducibility. | File Name | Description | | :--- | :--- | | **`pytorch_model.bin`** | The trained model weights. | | **`config.json`** | Model configuration/hyperparameters. | | **`model.py`** | **Core Logic:** Custom `TinyLLM` architecture definition. | | **`tokenizer.py`** | **Core Logic:** Custom `CharacterTokenizer` and data generator. | | **`dataset.py`** | Defines the `MathDataset` class and sequence shift logic. | | **`train.py`** | The complete training script and final hyperparameters. | | **`custom_run.py`** (or `run.py`) | Example script demonstrating how to use the model for generation. | | **`README.md`** | This model card and documentation. |