OpenLLM Small Extended 10k

This is the OpenLLM small model trained for 10,000 steps on the SQUAD dataset.

Model Details

  • Model Type: GPT-style transformer (decoder-only)
  • Training Steps: 10,000
  • Parameters: 35.8M
  • Vocabulary Size: 32,000
  • Context Length: 1,024 tokens
  • Architecture: 6 layers, 8 attention heads, 512 embedding dimension

Training Information

  • Dataset: SQUAD (Stanford Question Answering Dataset)
  • Training Data: ~41k Wikipedia passages
  • Tokenizer: SentencePiece BPE with 32k vocabulary
  • Optimizer: AdamW
  • Learning Rate: 3e-4
  • Batch Size: 4 (with gradient accumulation)

Performance

  • Final Loss: ~5.22
  • Inference Speed: ~8.3 tokens/second (CPU)
  • Memory Usage: ~143MB for inference

Usage

Using the Model

This model uses a custom configuration format and requires the OpenLLM framework to load properly.

# Load using the OpenLLM framework
from core.src.model import GPTModel
import json
import torch

# Load configuration
with open("config.json", "r") as f:
    config = json.load(f)

# Create model instance
model = GPTModel(config["model_config"])

# Load trained weights
model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))

# Load tokenizer
import sentencepiece as spm
tokenizer = spm.SentencePieceProcessor()
tokenizer.load("tokenizer.model")

# Generate text
prompt = "The future of artificial intelligence"
tokens = tokenizer.encode(prompt)
inputs = torch.tensor([tokens], dtype=torch.long)

with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_length=100,
        temperature=0.7
    )

generated_text = tokenizer.decode(outputs[0].tolist())
print(generated_text)

Using the Custom Loader

from load_hf_model import load_model_and_tokenizer

# Load model using custom loader
model, tokenizer = load_model_and_tokenizer("lemms/openllm-small-extended-10k")

# Generate text
prompt = "The history of machine learning"
tokens = tokenizer.encode(prompt)
inputs = torch.tensor([tokens], dtype=torch.long)

with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_length=100,
        temperature=0.7
    )

print(tokenizer.decode(outputs[0].tolist()))

Model Architecture

This model follows the standard GPT architecture:

  • Token Embeddings: Maps token IDs to dense vectors
  • Positional Embeddings: Adds position information
  • Transformer Blocks: 6 layers with multi-head attention and feed-forward networks
  • Layer Normalization: Pre-norm placement for training stability
  • Output Head: Linear projection to vocabulary for next-token prediction

Training Details

The model was trained using:

  • Framework: PyTorch
  • Hardware: CPU training with gradient accumulation
  • Regularization: Dropout (0.1), weight decay
  • Optimization: AdamW with cosine learning rate scheduling
  • Gradient Clipping: 1.0

Limitations

  • This is a small model (35.8M parameters) with limited capacity
  • Training was done on CPU, which limited the training steps
  • Model quality is basic and suitable for educational/research purposes
  • Not suitable for production use without further training

License

This model is dual-licensed:

  • Open Source: GPLv3 License
  • Commercial: Commercial License available

Citation

If you use this model in your research, please cite:

@misc{openllm2024,
  title={OpenLLM: Open Source Large Language Model Framework},
  author={Louis Chua Bean Chong},
  year={2024},
  url={https://github.com/louischua/openllm}
}

Model Card

  • Developed by: Louis Chua Bean Chong
  • Model type: Language Model
  • Language(s): English
  • License: GPLv3 / Commercial
  • Finetuned from model: Trained from scratch
  • Training data: SQUAD dataset
  • Training procedure: Supervised learning
  • Evaluation results: Basic text generation capability

Related Models

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train lemms/openllm-small-extended-10k

Space using lemms/openllm-small-extended-10k 1

Evaluation results