Trouter-Library commited on
Commit
4b84490
·
verified ·
1 Parent(s): 1c524f7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -29
README.md CHANGED
@@ -16,20 +16,17 @@ language:
16
  tags:
17
  - text-generation
18
  - transformers
 
19
  - research
20
  - code
21
  - mathematics
22
  - reasoning
23
  - multilingual
24
  - long-context
 
25
  pipeline_tag: text-generation
26
  library_name: transformers
27
- datasets:
28
- - scientific_papers
29
- - code_repositories
30
- - mathematical_proofs
31
- - conversational_data
32
- - multilingual_corpus
33
  inference: true
34
  ---
35
 
@@ -45,9 +42,9 @@ The model demonstrates exceptional performance in complex reasoning tasks, scori
45
 
46
  ### Core Specifications
47
 
48
- Helion-2.5-Rnd is built upon the LLaMA architecture with significant enhancements:
49
 
50
- - **Parameters**: 70 billion+ parameters
51
  - **Architecture Type**: Transformer-based causal language model
52
  - **Hidden Size**: 4096 dimensions
53
  - **Layers**: 32 transformer blocks
@@ -57,7 +54,8 @@ Helion-2.5-Rnd is built upon the LLaMA architecture with significant enhancement
57
  - **Context Window**: 131,072 tokens (128K)
58
  - **Positional Encoding**: YARN (Yet Another RoPE extensioN) with factor 8.0
59
  - **RoPE Theta**: 500,000
60
- - **Precision**: BF16/FP16 native, INT8/INT4 quantization supported
 
61
 
62
  ### Technical Innovations
63
 
@@ -75,20 +73,8 @@ The model incorporates several key architectural improvements:
75
 
76
  ## Training Methodology
77
 
78
- ### Data Composition
79
-
80
- The model was trained on 2.5 trillion tokens drawn from diverse high-quality sources:
81
-
82
- - Scientific papers and academic literature
83
- - Open-source code repositories across multiple programming languages
84
- - Mathematical proofs and computational reasoning datasets
85
- - High-quality conversational data
86
- - Multilingual text corpus covering 50+ languages
87
- - Technical documentation and structured knowledge
88
-
89
  ### Training Configuration
90
 
91
- - **Base Model**: Meta-Llama-3.1-70B
92
  - **Training Steps**: 150,000 steps
93
  - **Warmup Steps**: 2,000 steps
94
  - **Learning Rate**: 2.0e-5 with cosine scheduling
@@ -139,6 +125,18 @@ The model maintains consistent performance across its full 131K token context wi
139
 
140
  ## Installation and Deployment
141
 
 
 
 
 
 
 
 
 
 
 
 
 
142
  ### Prerequisites
143
 
144
  ```bash
@@ -329,14 +327,7 @@ inference:
329
  - **Storage**: 1TB+ NVMe SSD
330
  - **Network**: 100Gbps InfiniBand for optimal performance
331
 
332
- ### Quantization Options
333
-
334
- For reduced memory requirements:
335
-
336
- - **INT8**: ~50% memory reduction, minimal quality loss
337
- - **INT4**: ~75% memory reduction, acceptable for many tasks
338
- - **GPTQ**: Optimized 4-bit quantization
339
- - **AWQ**: Activation-aware weight quantization
340
 
341
  ## Use Cases and Applications
342
 
 
16
  tags:
17
  - text-generation
18
  - transformers
19
+ - llama
20
  - research
21
  - code
22
  - mathematics
23
  - reasoning
24
  - multilingual
25
  - long-context
26
+ - safetensors
27
  pipeline_tag: text-generation
28
  library_name: transformers
29
+ model_type: llama
 
 
 
 
 
30
  inference: true
31
  ---
32
 
 
42
 
43
  ### Core Specifications
44
 
45
+ Helion-2.5-Rnd is built upon an advanced transformer architecture with the following specifications:
46
 
47
+ - **Parameters**: 70 billion parameters
48
  - **Architecture Type**: Transformer-based causal language model
49
  - **Hidden Size**: 4096 dimensions
50
  - **Layers**: 32 transformer blocks
 
54
  - **Context Window**: 131,072 tokens (128K)
55
  - **Positional Encoding**: YARN (Yet Another RoPE extensioN) with factor 8.0
56
  - **RoPE Theta**: 500,000
57
+ - **Precision**: BF16/FP16 native (no quantization)
58
+ - **Weight Format**: SafeTensors for secure model storage
59
 
60
  ### Technical Innovations
61
 
 
73
 
74
  ## Training Methodology
75
 
 
 
 
 
 
 
 
 
 
 
 
76
  ### Training Configuration
77
 
 
78
  - **Training Steps**: 150,000 steps
79
  - **Warmup Steps**: 2,000 steps
80
  - **Learning Rate**: 2.0e-5 with cosine scheduling
 
125
 
126
  ## Installation and Deployment
127
 
128
+ ### Model Files
129
+
130
+ The model is distributed using SafeTensors format for enhanced security and faster loading:
131
+
132
+ ```
133
+ model.safetensors.index.json # Model shard index
134
+ model-00001-of-00015.safetensors
135
+ model-00002-of-00015.safetensors
136
+ ...
137
+ model-00015-of-00015.safetensors
138
+ ```
139
+
140
  ### Prerequisites
141
 
142
  ```bash
 
327
  - **Storage**: 1TB+ NVMe SSD
328
  - **Network**: 100Gbps InfiniBand for optimal performance
329
 
330
+ **Note**: This model is provided in full precision (BF16/FP16) without quantization to maintain maximum quality and accuracy.
 
 
 
 
 
 
 
331
 
332
  ## Use Cases and Applications
333