--- base_model: t5-small license: apache-2.0 datasets: - open-web-math/open-web-math tags: - text-generation - causal-lm - mamba - hrm - pytorch language: - en pipeline_tag: text-generation --- # CMBA-768M-OpenWebMath A 768M parameter Hierarchical Recurrent Memory (HRM) language model trained on high-quality math web text from OpenWebMath. This model uses **Mamba2 state-space models** instead of traditional attention mechanisms, enabling efficient long-range sequence modeling. ## Model Architecture **CMBA** (Causal Mamba-based Architecture) implements a hierarchical processing structure: - **Hierarchical Design**: Dual-level processing with H-layers (high-level abstraction) and L-layers (low-level specialists) - **Mamba2 Mixers**: State-space models replace attention for O(n) complexity vs O(n²) - **Adaptive Computation**: Halting mechanism allows variable compute per token (ACT-style pondering) - **Parameters**: ~768M total - **Context Length**: 1024 tokens ### Configuration ```python Model Dimensions: - d_model: 1024 - n_heads: 16 (for compatibility, not used in Mamba) - d_ff: 4096 - H_layers: 12 (high-level hierarchy) - L_layers: 12 (low-level processing) Mamba2 Settings: - d_state: 128 - expand: 2 - headdim: 64 - d_conv: 4 - ngroups: 1 Training: - Max halt steps: 1 - Block size: 1024 - Batch size: 64 (effective) - Learning rate: 3e-05 → 1e-06 - Weight decay: 0.1 ``` ## Training Data - **Dataset**: [open-web-math/open-web-math](https://huggingface.co/datasets/open-web-math/open-web-math) - **Tokenizer**: `t5-small` (T5 SentencePiece) - **Vocab Size**: 32100 ## Latest Performance (Epoch 0) - **Validation Loss**: `10.3766` - **Validation Perplexity**: `32099.98` ## Usage ```python from transformers import T5Tokenizer from hrm_text1_mamba1_donor import HRMText1 tokenizer = T5Tokenizer.from_pretrained("t5-small") model = HRMText1.from_pretrained("Viharikvs/CMBA-768M-OpenWebMath") # Generate text input_ids = tokenizer("Once upon a time", return_tensors="pt").input_ids outputs = model.generate(input_ids, max_length=100) print(tokenizer.decode(outputs[0])) ``` ## Citation If you use this model, please cite: ```bibtex @misc{cmba-768m-openwebmath, author = {Vihari}, title = {CMBA-768M-OpenWebMath: Hierarchical Mamba-based Language Model}, year = {2025}, publisher = {HuggingFace}, url = {https://huggingface.co/Viharikvs/CMBA-768M-OpenWebMath} } ``` ## License Apache 2.0