vuminhtue commited on
Commit
c1da56c
·
verified ·
1 Parent(s): 477e3a2

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +116 -0
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - pytorch
6
+ - text-generation
7
+ - gemma3
8
+ - tinystories
9
+ ---
10
+
11
+ # Gemma3-270M Pre-trained on TinyStories
12
+
13
+ This is a Gemma3-270M model pre-trained on the TinyStories dataset for 150k iterations.
14
+
15
+ ## Model Details
16
+
17
+ - **Architecture**: Gemma3-270M
18
+ - **Training Data**: TinyStories dataset from HuggingFace
19
+ - **Training Iterations**: 150,000
20
+ - **Parameters**: ~270M unique parameters
21
+ - **Tokenizer**: GPT-2 tokenizer (tiktoken)
22
+ - **Training Loss**: Available in training history
23
+
24
+ ## Quick Start
25
+
26
+ ### Download the Model
27
+
28
+ ```python
29
+ from huggingface_hub import hf_hub_download
30
+ import torch
31
+
32
+ # Download model weights
33
+ model_path = hf_hub_download(
34
+ repo_id="vuminhtue/gemma3_270m_150k_tinystories",
35
+ filename="Gemma3_270m_150k_model_params.pt"
36
+ )
37
+
38
+ # Download config
39
+ config_path = hf_hub_download(
40
+ repo_id="vuminhtue/gemma3_270m_150k_tinystories",
41
+ filename="config.json"
42
+ )
43
+ ```
44
+
45
+ ### Load and Use
46
+
47
+ ```python
48
+ import torch
49
+ import tiktoken
50
+ from Gemma3_model import Gemma3Model # You need this file from the original code
51
+
52
+ # Set up configuration
53
+ GEMMA3_CONFIG = {
54
+ "vocab_size": 256000,
55
+ "context_length": 8192,
56
+ "emb_dim": 2048,
57
+ "n_heads": 8,
58
+ "n_layers": 18,
59
+ "hidden_dim": 16384,
60
+ "head_dim": 256,
61
+ "dtype": torch.bfloat16,
62
+ }
63
+
64
+ # Load model
65
+ model = Gemma3Model(GEMMA3_CONFIG)
66
+ device = "cuda" if torch.cuda.is_available() else "cpu"
67
+ model.load_state_dict(torch.load(model_path, map_location=device))
68
+ model = model.to(device)
69
+ model.eval()
70
+
71
+ # Generate text
72
+ tokenizer = tiktoken.get_encoding("gpt2")
73
+ # Your generation code here...
74
+ ```
75
+
76
+ ## Training Details
77
+
78
+ - **Optimizer**: AdamW with weight decay (0.1)
79
+ - **Learning Rate**: 1e-4 with warmup and cosine decay
80
+ - **Batch Size**: 32 with gradient accumulation (32 steps)
81
+ - **Context Length**: 128 tokens
82
+ - **Mixed Precision**: bfloat16 training
83
+
84
+ ## Model Architecture
85
+
86
+ - Multi-Head Attention
87
+ - RoPE (Rotary Position Embeddings)
88
+ - RMSNorm for normalization
89
+ - SiLU activation function
90
+ - 18 transformer layers
91
+
92
+ ## Performance
93
+
94
+ The model was trained on TinyStories, a dataset of simple stories for children. It can generate coherent short stories in a similar style.
95
+
96
+ ## Citation
97
+
98
+ If you use this model, please cite:
99
+
100
+ ```bibtex
101
+ @misc{gemma3-tinystories-2025,
102
+ author = {Your Name},
103
+ title = {Gemma3-270M Pre-trained on TinyStories},
104
+ year = {2025},
105
+ publisher = {HuggingFace},
106
+ howpublished = {\url{https://huggingface.co/vuminhtue/gemma3_270m_150k_tinystories}},
107
+ }
108
+ ```
109
+
110
+ ## License
111
+
112
+ MIT License
113
+
114
+ ## Contact
115
+
116
+ For questions or issues, please open an issue on the HuggingFace model page.