llama-1b-3blocks-BI-pruned-KD-bookcorpus-improved_epoch_10

Model Description

This is a knowledge-distilled version of Llama-1B using improved token chunking approach.

Training Details

  • Teacher Model: meta-llama/Llama-3.2-1B
  • Student Model: Mostafa8Mehrabi/llama-1b-pruned-3blocks-bi-therapy-calibration
  • Dataset: bookcorpus
  • Total Tokens: 10,000,000
  • Chunks Created: 54,645
  • Average Chunk Length: 192.0 tokens
  • Overlap Size: 8 tokens
  • Epoch: 10
  • Training Loss: 15.0462
  • Learning Rate: 1e-05
  • Soft/Hard Loss Ratio: 1/0
  • Training Date: 2025-07-30T00:04:56.910594

Improvements

  • Uses overlapping token chunks instead of truncating passages
  • Combines soft (teacher) and hard (ground truth) losses
  • Better preservation of context across chunk boundaries

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Mostafa8Mehrabi/llama-1b-3blocks-BI-pruned-KD-bookcorpus-improved_epoch_10")
model = AutoModelForCausalLM.from_pretrained("Mostafa8Mehrabi/llama-1b-3blocks-BI-pruned-KD-bookcorpus-improved_epoch_10")
Downloads last month
6
Safetensors
Model size
1B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support