Viharikvs commited on
Commit
cc39d6b
·
verified ·
1 Parent(s): 837a57f

Model card updated after epoch 0

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -31,9 +31,9 @@ A 768M parameter Hierarchical Recurrent Memory (HRM) language model trained on h
31
  ### Configuration
32
  ```python
33
  Model Dimensions:
34
- - d_model: 768
35
- - n_heads: 12 (for compatibility, not used in Mamba)
36
- - d_ff: 3072
37
  - H_layers: 12 (high-level hierarchy)
38
  - L_layers: 12 (low-level processing)
39
 
@@ -45,10 +45,10 @@ Mamba2 Settings:
45
  - ngroups: 1
46
 
47
  Training:
48
- - Max halt steps: 8
49
  - Block size: 1024
50
- - Batch size: 32 (effective)
51
- - Learning rate: 0.0002 → 1e-06
52
  - Weight decay: 0.1
53
  ```
54
 
@@ -58,16 +58,16 @@ Training:
58
  - **Tokenizer**: `t5-small` (T5 SentencePiece)
59
  - **Vocab Size**: 32100
60
 
61
- ## Latest Performance (Epoch 1)
62
 
63
- - **Validation Loss**: `8.3293`
64
- - **Validation Perplexity**: `4143.72`
65
 
66
  ## Usage
67
 
68
  ```python
69
  from transformers import T5Tokenizer
70
- from hrm_text1_modeling import HRMText1
71
 
72
  tokenizer = T5Tokenizer.from_pretrained("t5-small")
73
  model = HRMText1.from_pretrained("Viharikvs/CMBA-768M-OpenWebMath")
 
31
  ### Configuration
32
  ```python
33
  Model Dimensions:
34
+ - d_model: 1024
35
+ - n_heads: 16 (for compatibility, not used in Mamba)
36
+ - d_ff: 4096
37
  - H_layers: 12 (high-level hierarchy)
38
  - L_layers: 12 (low-level processing)
39
 
 
45
  - ngroups: 1
46
 
47
  Training:
48
+ - Max halt steps: 1
49
  - Block size: 1024
50
+ - Batch size: 64 (effective)
51
+ - Learning rate: 3e-05 → 1e-06
52
  - Weight decay: 0.1
53
  ```
54
 
 
58
  - **Tokenizer**: `t5-small` (T5 SentencePiece)
59
  - **Vocab Size**: 32100
60
 
61
+ ## Latest Performance (Epoch 0)
62
 
63
+ - **Validation Loss**: `10.3766`
64
+ - **Validation Perplexity**: `32099.98`
65
 
66
  ## Usage
67
 
68
  ```python
69
  from transformers import T5Tokenizer
70
+ from hrm_text1_mamba1_donor import HRMText1
71
 
72
  tokenizer = T5Tokenizer.from_pretrained("t5-small")
73
  model = HRMText1.from_pretrained("Viharikvs/CMBA-768M-OpenWebMath")