---
license: mit
language: en
tags:
- llm
- pytorch
- custom-model
- causal-lm
- character-level
- math
- tiny-model
model_type: tiny-causal-llm
datasets:
- custom
pipeline_tag: text-generation
---
# TinyLLM: Character-Level Math Solver

## Model Description

**TinyLLM** is a highly compact, character-level **Causal Language Model** (based on the standard Transformer decoder architecture) trained specifically to solve single-digit math problems.

This model serves as a minimalist, educational example of how a standard LLM architecture can be trained from scratch on a very small, custom dataset.

### Key Features
* **Architecture:** Causal Transformer Decoder.
* **Task:** Character-level text generation (autoregressive).
* **Input/Output:** Solves problems formatted as `N op N` and generates the answer, e.g., `4 + 5 = 9<EOS>`.
* **Custom Code Required:** This is a custom PyTorch model and requires custom code (`model.py`, `tokenizer.py`) to be loaded by users.

---
##  How to Use (Inference)

To load and run this custom model, users must download the entire repository structure and use the provided custom code, specifically the `TinyLLM` class defined in **`model.py`** and the `CharacterTokenizer` in **`tokenizer.py`**.

### 1. Installation

First, ensure you have the required libraries installed:
```bash
pip install torch huggingface-hub
from huggingface_hub import snapshot_download
import torch
import os
import sys

# 1. Configuration: REPLACE with your repository ID
MODEL_ID = "anujbhatt4ai/tiny-math-llm" 
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

# 2. Download all files (code and weights)
local_path = snapshot_download(repo_id=MODEL_ID)

# 3. Import Custom Classes
# The downloaded path must be added to sys.path to allow custom imports
sys.path.append(local_path) 
from model import TinyLLM
from tokenizer import CharacterTokenizer, generate_v1_data

# 4. Setup and Load Model
def load_tiny_llm():
    # In this minimal case, we hardcode the known config values
    vocab_size = 22
    block_size = 14
    
    # Initialize the model with the exact trained parameters
    model = TinyLLM(
        vocab_size=vocab_size, 
        block_size=block_size, 
        n_embed=64, n_head=4, n_layer=4, dropout=0.1
    ).to(DEVICE)

    # Load the trained weights
    weights_path = os.path.join(local_path, "pytorch_model.bin")
    model.load_state_dict(torch.load(weights_path, map_location=DEVICE))
    model.eval()
    
    # Initialize the tokenizer
    raw_data = generate_v1_data()
    tokenizer = CharacterTokenizer(raw_data)
    
    return model, tokenizer

# Use the loaded model and tokenizer in your own generation logic
model, tokenizer = load_tiny_llm()
print("Model loaded and ready for math inference!")

**Block 4: Training Details and Repository Files**

`markdown
##  Training Details

### Architecture Configuration

The `TinyLLM` is configured with the following parameters, derived from the `config.json` and `model.py` files:

| Parameter | Value | Description |
| :--- | :--- | :--- |
| **`vocab_size`** | `22` | The size of the character vocabulary. |
| **`block_size`** | `14` | The maximum sequence length (context window). |
| **`n_embed`** | `64` | Embedding dimension. |
| **`n_head`** | `4` | Number of attention heads. |
| **`n_layer`** | `4` | Number of Transformer decoder blocks. |
| **`dropout`** | `0.1` | Dropout rate. |

### Training Hyperparameters (from `train.py`)

| Parameter | Value |
| :--- | :--- |
| **`BATCH_SIZE`** | `32` |
| **`LEARNING_RATE`** | `1e-3` (AdamW) |
| **`EPOCHS`** | `100` |
| **`DEVICE`** | `cuda` if available, else `cpu` |

### Dataset

The model was trained on an **exhaustive set of all single-digit math problems** (addition, subtraction, multiplication, and non-remainder division) where the result is also a single digit (0-9). The **`dataset.py`** file contains the logic for the essential sequence shift used for language modeling training.

---

## Repository Files

This flat repository contains all the source code needed for complete reproducibility.

| File Name | Description |
| :--- | :--- |
| **`pytorch_model.bin`** | The trained model weights. |
| **`config.json`** | Model configuration/hyperparameters. |
| **`model.py`** | **Core Logic:** Custom `TinyLLM` architecture definition. |
| **`tokenizer.py`** | **Core Logic:** Custom `CharacterTokenizer` and data generator. |
| **`dataset.py`** | Defines the `MathDataset` class and sequence shift logic. |
| **`train.py`** | The complete training script and final hyperparameters. |
| **`custom_run.py`** (or `run.py`) | Example script demonstrating how to use the model for generation. |
| **`README.md`** | This model card and documentation. |