Update README.md
Browse files
README.md
CHANGED
|
@@ -16,20 +16,17 @@ language:
|
|
| 16 |
tags:
|
| 17 |
- text-generation
|
| 18 |
- transformers
|
|
|
|
| 19 |
- research
|
| 20 |
- code
|
| 21 |
- mathematics
|
| 22 |
- reasoning
|
| 23 |
- multilingual
|
| 24 |
- long-context
|
|
|
|
| 25 |
pipeline_tag: text-generation
|
| 26 |
library_name: transformers
|
| 27 |
-
|
| 28 |
-
- scientific_papers
|
| 29 |
-
- code_repositories
|
| 30 |
-
- mathematical_proofs
|
| 31 |
-
- conversational_data
|
| 32 |
-
- multilingual_corpus
|
| 33 |
inference: true
|
| 34 |
---
|
| 35 |
|
|
@@ -45,9 +42,9 @@ The model demonstrates exceptional performance in complex reasoning tasks, scori
|
|
| 45 |
|
| 46 |
### Core Specifications
|
| 47 |
|
| 48 |
-
Helion-2.5-Rnd is built upon
|
| 49 |
|
| 50 |
-
- **Parameters**: 70 billion
|
| 51 |
- **Architecture Type**: Transformer-based causal language model
|
| 52 |
- **Hidden Size**: 4096 dimensions
|
| 53 |
- **Layers**: 32 transformer blocks
|
|
@@ -57,7 +54,8 @@ Helion-2.5-Rnd is built upon the LLaMA architecture with significant enhancement
|
|
| 57 |
- **Context Window**: 131,072 tokens (128K)
|
| 58 |
- **Positional Encoding**: YARN (Yet Another RoPE extensioN) with factor 8.0
|
| 59 |
- **RoPE Theta**: 500,000
|
| 60 |
-
- **Precision**: BF16/FP16 native
|
|
|
|
| 61 |
|
| 62 |
### Technical Innovations
|
| 63 |
|
|
@@ -75,20 +73,8 @@ The model incorporates several key architectural improvements:
|
|
| 75 |
|
| 76 |
## Training Methodology
|
| 77 |
|
| 78 |
-
### Data Composition
|
| 79 |
-
|
| 80 |
-
The model was trained on 2.5 trillion tokens drawn from diverse high-quality sources:
|
| 81 |
-
|
| 82 |
-
- Scientific papers and academic literature
|
| 83 |
-
- Open-source code repositories across multiple programming languages
|
| 84 |
-
- Mathematical proofs and computational reasoning datasets
|
| 85 |
-
- High-quality conversational data
|
| 86 |
-
- Multilingual text corpus covering 50+ languages
|
| 87 |
-
- Technical documentation and structured knowledge
|
| 88 |
-
|
| 89 |
### Training Configuration
|
| 90 |
|
| 91 |
-
- **Base Model**: Meta-Llama-3.1-70B
|
| 92 |
- **Training Steps**: 150,000 steps
|
| 93 |
- **Warmup Steps**: 2,000 steps
|
| 94 |
- **Learning Rate**: 2.0e-5 with cosine scheduling
|
|
@@ -139,6 +125,18 @@ The model maintains consistent performance across its full 131K token context wi
|
|
| 139 |
|
| 140 |
## Installation and Deployment
|
| 141 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 142 |
### Prerequisites
|
| 143 |
|
| 144 |
```bash
|
|
@@ -329,14 +327,7 @@ inference:
|
|
| 329 |
- **Storage**: 1TB+ NVMe SSD
|
| 330 |
- **Network**: 100Gbps InfiniBand for optimal performance
|
| 331 |
|
| 332 |
-
|
| 333 |
-
|
| 334 |
-
For reduced memory requirements:
|
| 335 |
-
|
| 336 |
-
- **INT8**: ~50% memory reduction, minimal quality loss
|
| 337 |
-
- **INT4**: ~75% memory reduction, acceptable for many tasks
|
| 338 |
-
- **GPTQ**: Optimized 4-bit quantization
|
| 339 |
-
- **AWQ**: Activation-aware weight quantization
|
| 340 |
|
| 341 |
## Use Cases and Applications
|
| 342 |
|
|
|
|
| 16 |
tags:
|
| 17 |
- text-generation
|
| 18 |
- transformers
|
| 19 |
+
- llama
|
| 20 |
- research
|
| 21 |
- code
|
| 22 |
- mathematics
|
| 23 |
- reasoning
|
| 24 |
- multilingual
|
| 25 |
- long-context
|
| 26 |
+
- safetensors
|
| 27 |
pipeline_tag: text-generation
|
| 28 |
library_name: transformers
|
| 29 |
+
model_type: llama
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
inference: true
|
| 31 |
---
|
| 32 |
|
|
|
|
| 42 |
|
| 43 |
### Core Specifications
|
| 44 |
|
| 45 |
+
Helion-2.5-Rnd is built upon an advanced transformer architecture with the following specifications:
|
| 46 |
|
| 47 |
+
- **Parameters**: 70 billion parameters
|
| 48 |
- **Architecture Type**: Transformer-based causal language model
|
| 49 |
- **Hidden Size**: 4096 dimensions
|
| 50 |
- **Layers**: 32 transformer blocks
|
|
|
|
| 54 |
- **Context Window**: 131,072 tokens (128K)
|
| 55 |
- **Positional Encoding**: YARN (Yet Another RoPE extensioN) with factor 8.0
|
| 56 |
- **RoPE Theta**: 500,000
|
| 57 |
+
- **Precision**: BF16/FP16 native (no quantization)
|
| 58 |
+
- **Weight Format**: SafeTensors for secure model storage
|
| 59 |
|
| 60 |
### Technical Innovations
|
| 61 |
|
|
|
|
| 73 |
|
| 74 |
## Training Methodology
|
| 75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
### Training Configuration
|
| 77 |
|
|
|
|
| 78 |
- **Training Steps**: 150,000 steps
|
| 79 |
- **Warmup Steps**: 2,000 steps
|
| 80 |
- **Learning Rate**: 2.0e-5 with cosine scheduling
|
|
|
|
| 125 |
|
| 126 |
## Installation and Deployment
|
| 127 |
|
| 128 |
+
### Model Files
|
| 129 |
+
|
| 130 |
+
The model is distributed using SafeTensors format for enhanced security and faster loading:
|
| 131 |
+
|
| 132 |
+
```
|
| 133 |
+
model.safetensors.index.json # Model shard index
|
| 134 |
+
model-00001-of-00015.safetensors
|
| 135 |
+
model-00002-of-00015.safetensors
|
| 136 |
+
...
|
| 137 |
+
model-00015-of-00015.safetensors
|
| 138 |
+
```
|
| 139 |
+
|
| 140 |
### Prerequisites
|
| 141 |
|
| 142 |
```bash
|
|
|
|
| 327 |
- **Storage**: 1TB+ NVMe SSD
|
| 328 |
- **Network**: 100Gbps InfiniBand for optimal performance
|
| 329 |
|
| 330 |
+
**Note**: This model is provided in full precision (BF16/FP16) without quantization to maintain maximum quality and accuracy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 331 |
|
| 332 |
## Use Cases and Applications
|
| 333 |
|