Safetensors
qwen2
fp8
juezhi commited on
Commit
ab5e598
Β·
verified Β·
1 Parent(s): b0b7e2c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +131 -41
README.md CHANGED
@@ -1,63 +1,127 @@
1
-
2
  ---
3
  license: apache-2.0
4
  ---
5
 
6
- ## Introduction
7
-
8
- **InfiR2-1.5B-base-FP8** is derived from the **Qwen2.5-1.5B-base** model via continued pre-training using the **FP8** recipe.
9
-
10
- ## Model Download
11
-
12
- ```bash
13
- # Create a directory for models
14
- mkdir -p ./models
15
- # Download InfiR2-1.5B-base-FP8 model
16
- huggingface-cli download --resume-download InfiX-ai/InfiR2-1.5B-base-FP8 --local-dir ./models/InfiR2-1.5B-base-FP8
17
- ````
18
-
19
-
20
- ## Quick Start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ```python
 
23
  import torch
24
- from transformers import AutoModelForCausalLM, AutoTokenizer
25
 
26
  MODEL_NAME = "InfiX-ai/InfiR2-1.5B-base-FP8"
27
 
28
  prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
29
 
30
- MAX_NEW_TOKENS = 256
31
- TEMPERATURE = 0.8
32
- DO_SAMPLE = True
33
 
34
- tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
 
 
 
35
 
36
- device = "cuda" if torch.cuda.is_available() else "cpu"
37
- model = AutoModelForCausalLM.from_pretrained(
38
- MODEL_NAME,
39
- torch_dtype=torch.bfloat16 if device == "cuda" else None
40
- ).to(device)
41
 
 
42
  messages = [
43
  {"role": "user", "content": prompt_text}
44
  ]
45
- input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", tokenize=True).to(device) # Added tokenize=True
46
 
47
- with torch.no_grad():
48
- output_ids = model.generate(
49
- input_ids,
50
- max_new_tokens=MAX_NEW_TOKENS,
51
- temperature=TEMPERATURE,
52
- do_sample=DO_SAMPLE,
53
- pad_token_id=tokenizer.eos_token_id
54
- )
55
 
56
- generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
57
 
58
- # For a base model, the output is often a direct continuation of the prompt.
59
- response_start_index = generated_text.rfind(prompt_text) + len(prompt_text)
60
- llm_response = generated_text[response_start_index:].strip()
61
 
62
  print("\n" + "="*70)
63
  print(f"Prompt: \n{prompt_text}")
@@ -66,11 +130,37 @@ print(f"(LLM Response): \n{llm_response}")
66
  print("="*70)
67
  ```
68
 
69
- ## Acknowledgements
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
  * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
72
 
73
- ## Citation
74
 
75
  If you find our work useful, please cite:
76
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
 
5
+ # InfiR2-1.5B-base-FP8
6
+
7
+ <p align="center">
8
+ Β  <a href="https://arxiv.org/abs/2509.22536">πŸ“„ Paper</a> &nbsp; | &nbsp;
9
+ <a href="https://github.com/InfiXAI/InfiR2"> πŸ™ Github</a> &nbsp; |
10
+   <a href="https://infix-ai.com/research/infir2/">🌐 Project Website</a> &nbsp;
11
+ </p>
12
+
13
+ We performed continual pre-training (CPT) on the **Qwen2.5-1.5B-base** model for an additional 160 billion tokens using the FP8 format. In this process, both the forward and backward passes employed the E4M3 format, and quantization scaling factors were represented in UE8M0. The training data mixture was composed of:
14
+ - 140B tokens from public sources, including FineWeb, Nemotron Datasets and stack-edu and issues-kaggle-notebooks.
15
+ - A subsequent 20B tokens mixed with data from AM-DeepSeek-R1 and AM-Qwen3.
16
+
17
+ The resulting model is the **InfiR2-1.5B-base-FP8**.
18
+
19
+
20
+ **Training Recipe**:
21
+ <p align="center">
22
+ <img src="fp8_recipe.png" width="100%"/>
23
+ <p>
24
+
25
+ - Stable and Reproducible Performance
26
+ - Efficient and Low memory Training
27
+
28
+
29
+
30
+ ## πŸš€ InfiR2 Model Series
31
+
32
+ The InfiR2 framework offers multiple variants model with different size and training strategy:
33
+
34
+ - **1.5B**
35
+ - [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base*
36
+ - [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
37
+ - **7B**
38
+ - [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base*
39
+ - [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
40
+ - [InfiR2-R1-7B-FP8](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8): *Reinforcement learning on InfiR2-7B-Instruct-FP8 with dapo dataset*
41
+
42
+ ## πŸ“Š Model Performance
43
+ The **InfiR2-1.5B-Instruct-FP8** model is the result of further fine-tuning applied to the **InfiR2-1.5B-base-FP8**. For further details, refer to [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8). Below is the performance comparison of InfiR2-1.5B-Instruct-FP8 on reasoning benchmarks. Note: 'w. InfiAlign' denotes Supervised Fine-Tuning (SFT) using the InfiAlign dataset.
44
+
45
+ <div align="center">
46
+
47
+ <table>
48
+ <thead>
49
+ <tr>
50
+ <th align="left">Model</th>
51
+ <th align="center">AIME 25</th>
52
+ <th align="center">AIME 24</th>
53
+ <th align="center">GPQA</th>
54
+ <th align="center">LiveCodeBench v5</th>
55
+ </tr>
56
+ </thead>
57
+ <tbody>
58
+ <tr>
59
+ <td align="left"><strong>Deepseek-Distill-Qwen-1.5B</strong></td>
60
+ <td align="center">21.35</td>
61
+ <td align="center">26.87</td>
62
+ <td align="center">32.26</td>
63
+ <td align="center">18.50</td>
64
+ </tr>
65
+ <tr>
66
+ <td align="left"><strong>Qwen2.5-1.5B-base (w. InfiAlign)</strong></td>
67
+ <td align="center">14.58</td>
68
+ <td align="center">10.52</td>
69
+ <td align="center">28.98</td>
70
+ <td align="center">12.99</td>
71
+ </tr>
72
+ <tr>
73
+ <td align="left"><strong>InfiR2-1.5B-Instruct-FP8</strong></td>
74
+ <td align="center">18.45</td>
75
+ <td align="center">17.39</td>
76
+ <td align="center">29.48</td>
77
+ <td align="center">17.10</td>
78
+ </tr>
79
+ </tbody>
80
+ </table>
81
+
82
+ </div>
83
+
84
+
85
+ ## 🎭 Quick Start
86
 
87
  ```python
88
+ from vllm import LLM, SamplingParams
89
  import torch
90
+ import os
91
 
92
  MODEL_NAME = "InfiX-ai/InfiR2-1.5B-base-FP8"
93
 
94
  prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
95
 
96
+ MAX_NEW_TOKENS = 256
97
+ TEMPERATURE = 0.8
98
+ DO_SAMPLE = True
99
 
100
+ llm = LLM(
101
+ model=MODEL_NAME,
102
+ dtype="auto",
103
+ )
104
 
105
+ sampling_params = SamplingParams(
106
+ n=1,
107
+ temperature=TEMPERATURE,
108
+ max_tokens=MAX_NEW_TOKENS,
109
+ )
110
 
111
+ tokenizer = llm.get_tokenizer()
112
  messages = [
113
  {"role": "user", "content": prompt_text}
114
  ]
115
+ prompt_formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
116
 
117
+ outputs = llm.generate(
118
+ prompt_formatted,
119
+ sampling_params
120
+ )
 
 
 
 
121
 
122
+ generated_text = outputs[0].outputs[0].text
123
 
124
+ llm_response = generated_text.strip()
 
 
125
 
126
  print("\n" + "="*70)
127
  print(f"Prompt: \n{prompt_text}")
 
130
  print("="*70)
131
  ```
132
 
133
+ ## πŸ“š Model Download
134
+
135
+ ```bash
136
+ # Create a directory for models
137
+ mkdir -p ./models
138
+ # Download InfiR2-1.5B-base-FP8 model
139
+ huggingface-cli download --resume-download InfiX-ai/InfiR2-1.5B-base-FP8 --local-dir ./models/InfiR2-1.5B-base-FP8
140
+ ```
141
+ ## 🎯 Intended Uses
142
+
143
+ ### βœ… Direct Use
144
+
145
+ This model is intended for research and commercial use. Example use cases include:
146
+
147
+ - Instruction following
148
+ - Mathematical reasoning
149
+ - Code generation
150
+ - General reasoning
151
+
152
+ ### ❌ Out-of-Scope Use
153
+
154
+ The model should **not** be used for:
155
+
156
+ - Generating harmful, offensive, or inappropriate content
157
+ - Creating misleading information
158
+
159
+ ## πŸ™ Acknowledgements
160
 
161
  * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
162
 
163
+ ## πŸ“Œ Citation
164
 
165
  If you find our work useful, please cite:
166