sky-2002's picture
Upload README.md
64eba4e verified
|
raw
history blame
2.5 kB
metadata
language:
  - en
tags:
  - text-generation
  - pytorch
  - deepseek
  - mixture-of-experts
  - moe
  - tinystories
  - language-model
  - multi-head-latent-attention
datasets:
  - roneneldan/TinyStories

Deepseek-inspired TinyStories Model

This is a Deepseek-inspired model trained on TinyStories dataset, featuring Mixture of Experts (MoE) architecture.

Model Details

  • Model Type: Autoregressive Language Model with Mixture of Experts
  • Architecture: Deepseek-inspired with MHLA, MoE layers with auxiliary loss free load balancing, etc
  • Parameters: ~60M
  • Training Data: TinyStories dataset
  • License: MIT

Model Architecture

  • Attention Heads: 8
  • Embedding Dimension: 512
  • Max Sequence Length: 512
  • MoE Configuration:
    • Shared Experts: 2
    • Routed Experts: 4
    • Top-K routing: 2
    • Expert Intermediate Dimension: 1536

Usage Example

Method 1: Direct imports from package

from deepseek_tinystories import DeepseekInspiredModel, DeepSeekModelConfig, TinyStoriesProcessor, generate_text
import torch, json

# Load config & model
config = DeepSeekModelConfig(**json.load(open("config.json")))
model = DeepseekInspiredModel(config)
model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))
model.eval()

# Initialize processor
processor = TinyStoriesProcessor()

# Generate text
prompt = "Once upon a time, there was a little girl..."
generated_text = generate_text(
    model=model,
    data_processor=processor,
    prompt=prompt,
    max_new_tokens=50,
    temperature=0.8,
    top_k=40,
    device="cpu"
)
print(generated_text)

Method 2: Module-specific imports

from deepseek_tinystories.modeling_deepseek import DeepseekInspiredModel, DeepSeekModelConfig
from deepseek_tinystories.processor import TinyStoriesProcessor
from deepseek_tinystories.utils import generate_text
import torch, json

# Load config & model
config = DeepSeekModelConfig(**json.load(open("config.json")))
model = DeepseekInspiredModel(config)

# Load clean model weights (not checkpoint)
model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))
model.eval()

# Initialize processor
processor = TinyStoriesProcessor()

# Generate text using utils
prompt = "Once upon a time, there was a little girl..."
generated_text = generate_text(
    model=model,
    data_processor=processor,
    prompt=prompt,
    max_new_tokens=50,
    temperature=0.8,
    top_k=40,
    device="cpu"
)
print(generated_text)