Model card updated after epoch 0
Browse files
README.md
CHANGED
|
@@ -31,9 +31,9 @@ A 768M parameter Hierarchical Recurrent Memory (HRM) language model trained on h
|
|
| 31 |
### Configuration
|
| 32 |
```python
|
| 33 |
Model Dimensions:
|
| 34 |
-
- d_model:
|
| 35 |
-
- n_heads:
|
| 36 |
-
- d_ff:
|
| 37 |
- H_layers: 12 (high-level hierarchy)
|
| 38 |
- L_layers: 12 (low-level processing)
|
| 39 |
|
|
@@ -45,10 +45,10 @@ Mamba2 Settings:
|
|
| 45 |
- ngroups: 1
|
| 46 |
|
| 47 |
Training:
|
| 48 |
-
- Max halt steps:
|
| 49 |
- Block size: 1024
|
| 50 |
-
- Batch size:
|
| 51 |
-
- Learning rate:
|
| 52 |
- Weight decay: 0.1
|
| 53 |
```
|
| 54 |
|
|
@@ -58,16 +58,16 @@ Training:
|
|
| 58 |
- **Tokenizer**: `t5-small` (T5 SentencePiece)
|
| 59 |
- **Vocab Size**: 32100
|
| 60 |
|
| 61 |
-
## Latest Performance (Epoch
|
| 62 |
|
| 63 |
-
- **Validation Loss**: `
|
| 64 |
-
- **Validation Perplexity**: `
|
| 65 |
|
| 66 |
## Usage
|
| 67 |
|
| 68 |
```python
|
| 69 |
from transformers import T5Tokenizer
|
| 70 |
-
from
|
| 71 |
|
| 72 |
tokenizer = T5Tokenizer.from_pretrained("t5-small")
|
| 73 |
model = HRMText1.from_pretrained("Viharikvs/CMBA-768M-OpenWebMath")
|
|
|
|
| 31 |
### Configuration
|
| 32 |
```python
|
| 33 |
Model Dimensions:
|
| 34 |
+
- d_model: 1024
|
| 35 |
+
- n_heads: 16 (for compatibility, not used in Mamba)
|
| 36 |
+
- d_ff: 4096
|
| 37 |
- H_layers: 12 (high-level hierarchy)
|
| 38 |
- L_layers: 12 (low-level processing)
|
| 39 |
|
|
|
|
| 45 |
- ngroups: 1
|
| 46 |
|
| 47 |
Training:
|
| 48 |
+
- Max halt steps: 1
|
| 49 |
- Block size: 1024
|
| 50 |
+
- Batch size: 64 (effective)
|
| 51 |
+
- Learning rate: 3e-05 → 1e-06
|
| 52 |
- Weight decay: 0.1
|
| 53 |
```
|
| 54 |
|
|
|
|
| 58 |
- **Tokenizer**: `t5-small` (T5 SentencePiece)
|
| 59 |
- **Vocab Size**: 32100
|
| 60 |
|
| 61 |
+
## Latest Performance (Epoch 0)
|
| 62 |
|
| 63 |
+
- **Validation Loss**: `10.3766`
|
| 64 |
+
- **Validation Perplexity**: `32099.98`
|
| 65 |
|
| 66 |
## Usage
|
| 67 |
|
| 68 |
```python
|
| 69 |
from transformers import T5Tokenizer
|
| 70 |
+
from hrm_text1_mamba1_donor import HRMText1
|
| 71 |
|
| 72 |
tokenizer = T5Tokenizer.from_pretrained("t5-small")
|
| 73 |
model = HRMText1.from_pretrained("Viharikvs/CMBA-768M-OpenWebMath")
|