metadata
language:
- en
tags:
- text-generation
- pytorch
- deepseek
- mixture-of-experts
- moe
- tinystories
- language-model
- multi-head-latent-attention
datasets:
- roneneldan/TinyStories
Deepseek-inspired TinyStories Model
This is a Deepseek-inspired model trained on TinyStories dataset, featuring Mixture of Experts (MoE) architecture.
Model Details
- Model Type: Autoregressive Language Model with Mixture of Experts
- Architecture: Deepseek-inspired with MHLA, MoE layers with auxiliary loss free load balancing, etc
- Parameters: ~60M
- Training Data: TinyStories dataset
- License: MIT
Model Architecture
- Attention Heads: 8
- Embedding Dimension: 512
- Max Sequence Length: 512
- MoE Configuration:
- Shared Experts: 2
- Routed Experts: 4
- Top-K routing: 2
- Expert Intermediate Dimension: 1536
Usage Example
Method 1: Direct imports from package
from deepseek_tinystories import DeepseekInspiredModel, DeepSeekModelConfig, TinyStoriesProcessor, generate_text
import torch, json
# Load config & model
config = DeepSeekModelConfig(**json.load(open("config.json")))
model = DeepseekInspiredModel(config)
model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))
model.eval()
# Initialize processor
processor = TinyStoriesProcessor()
# Generate text
prompt = "Once upon a time, there was a little girl..."
generated_text = generate_text(
model=model,
data_processor=processor,
prompt=prompt,
max_new_tokens=50,
temperature=0.8,
top_k=40,
device="cpu"
)
print(generated_text)
Method 2: Module-specific imports
from deepseek_tinystories.modeling_deepseek import DeepseekInspiredModel, DeepSeekModelConfig
from deepseek_tinystories.processor import TinyStoriesProcessor
from deepseek_tinystories.utils import generate_text
import torch, json
# Load config & model
config = DeepSeekModelConfig(**json.load(open("config.json")))
model = DeepseekInspiredModel(config)
# Load clean model weights (not checkpoint)
model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))
model.eval()
# Initialize processor
processor = TinyStoriesProcessor()
# Generate text using utils
prompt = "Once upon a time, there was a little girl..."
generated_text = generate_text(
model=model,
data_processor=processor,
prompt=prompt,
max_new_tokens=50,
temperature=0.8,
top_k=40,
device="cpu"
)
print(generated_text)