llama-1b-3blocks-BI-pruned-KD-bookcorpus-improved_epoch_10
Model Description
This is a knowledge-distilled version of Llama-1B using improved token chunking approach.
Training Details
- Teacher Model: meta-llama/Llama-3.2-1B
- Student Model: Mostafa8Mehrabi/llama-1b-pruned-3blocks-bi-therapy-calibration
- Dataset: bookcorpus
- Total Tokens: 10,000,000
- Chunks Created: 54,645
- Average Chunk Length: 192.0 tokens
- Overlap Size: 8 tokens
- Epoch: 10
- Training Loss: 15.0462
- Learning Rate: 1e-05
- Soft/Hard Loss Ratio: 1/0
- Training Date: 2025-07-30T00:04:56.910594
Improvements
- Uses overlapping token chunks instead of truncating passages
- Combines soft (teacher) and hard (ground truth) losses
- Better preservation of context across chunk boundaries
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Mostafa8Mehrabi/llama-1b-3blocks-BI-pruned-KD-bookcorpus-improved_epoch_10")
model = AutoModelForCausalLM.from_pretrained("Mostafa8Mehrabi/llama-1b-3blocks-BI-pruned-KD-bookcorpus-improved_epoch_10")
- Downloads last month
- 6