JunHowie commited on
Commit
e64a099
·
verified ·
1 Parent(s): 4412991

Delete .ipynb_checkpoints

Browse files
.ipynb_checkpoints/README-checkpoint.md DELETED
@@ -1,279 +0,0 @@
1
- ---
2
- license: mit
3
- library_name: transformers
4
- pipeline_tag: text-generation
5
- tags:
6
- - vLLM
7
- - AWQ
8
- language:
9
- - zh
10
- - en
11
- base_model:
12
- - deepseek-ai/DeepSeek-V3.1
13
- base_model_relation: quantized
14
-
15
- ---
16
- # DeepSeek-V3.1-AWQ-Lite
17
- Base model: [DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1)
18
-
19
- ### 【Dependencies / Installation】
20
- As of **2025-08-28**, create a fresh Python environment and run:
21
-
22
- ```bash
23
- # ❗there are glitches with vllm 0.10.1.1, still looking for resolutions❗
24
- # ❗downgrade vllm for now ❗
25
- pip install vllm==0.9.2 transformers==4.53.0
26
-
27
- SITE_PACKAGES=$(pip -V | awk '{print $4}' | sed 's/\/pip$//')
28
- # ❗patch up AWQ MoE quant config, otherwise some modules cannot be properly loaded❗
29
- cp awq_marlin.py "$SITE_PACKAGES/vllm/model_executor/layers/quantization/awq_marlin.py"
30
- # ❗patch up for fp32 e_score_correction_bias, see https://www.github.com/vllm-project/vllm/pull/23640❗
31
- cp deepseek_v2.py "$SITE_PACKAGES/vllm/model_executor/models/deepseek_v2.py"
32
- ```
33
-
34
- ### 【vLLM Single Node with 8 GPUs — Startup Command】
35
- ```
36
- CONTEXT_LENGTH=32768
37
-
38
- vllm serve \
39
- QuantTrio/DeepSeek-V3.1-AWQ-Lite \
40
- --served-model-name DeepSeek-V3.1-AWQ-Lite \
41
- --swap-space 16 \
42
- --max-num-seqs 512 \
43
- --max-model-len $CONTEXT_LENGTH \
44
- --max-seq-len-to-capture $CONTEXT_LENGTH \
45
- --gpu-memory-utilization 0.8 \
46
- --tensor-parallel-size 8 \
47
- --trust-remote-code \
48
- --disable-log-requests \
49
- --host 0.0.0.0 \
50
- --port 8000
51
- ```
52
-
53
- ### 【Logs】
54
- ```
55
- 2025-08-28
56
- 1. Initial commit
57
- ```
58
-
59
- ### 【Model Files】
60
- | File Size | Last Updated |
61
- |-----------|--------------|
62
- | `337GB` | `2025-08-28` |
63
-
64
- ### 【Model Download】
65
- ```python
66
- from huggingface_hub import snapshot_download
67
- snapshot_download('QuantTrio/DeepSeek-V3.1-AWQ-Lite', cache_dir="your_local_path")
68
- ```
69
-
70
- ### 【Overview】
71
- <div align="center">
72
- <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
73
- </div>
74
- <hr>
75
- <div align="center" style="line-height: 1;">
76
- <a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
77
- <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/>
78
- </a>
79
- <a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
80
- <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V3-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
81
- </a>
82
- <a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
83
- <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
84
- </a>
85
- </div>
86
-
87
- <div align="center" style="line-height: 1;">
88
- <a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
89
- <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
90
- </a>
91
- <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
92
- <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
93
- </a>
94
- <a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
95
- <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
96
- </a>
97
- </div>
98
-
99
- <div align="center" style="line-height: 1;">
100
- <a href="LICENSE" style="margin: 2px;">
101
- <img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
102
- </a>
103
- </div>
104
-
105
- ## Introduction
106
-
107
- DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects:
108
-
109
- - **Hybrid thinking mode**: One model supports both thinking mode and non-thinking mode by changing the chat template.
110
-
111
- - **Smarter tool calling**: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved.
112
-
113
- - **Higher thinking efficiency**: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.
114
-
115
- DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats.
116
-
117
- ## Model Downloads
118
-
119
- <div align="center">
120
-
121
- | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
122
- | :------------: | :------------: | :------------: | :------------: | :------------: |
123
- | DeepSeek-V3.1-Base | 671B | 37B | 128K | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3.1-Base) |
124
- | DeepSeek-V3.1 | 671B | 37B | 128K | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V3.1) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3.1) |
125
-
126
- </div>
127
-
128
- ## Chat Template
129
-
130
- The details of our chat template is described in `tokenizer_config.json` and `assets/chat_template.jinja`. Here is a brief description.
131
-
132
- ### Non-Thinking
133
-
134
- #### First-Turn
135
-
136
- Prefix:
137
- `<|begin▁of▁sentence|>{system prompt}<|User|>{query}<|Assistant|></think>`
138
-
139
- With the given prefix, DeepSeek V3.1 generates responses to queries in non-thinking mode. Unlike DeepSeek V3, it introduces an additional token `</think>`.
140
-
141
- #### Multi-Turn
142
- Context:
143
- `<|begin▁of▁sentence|>{system prompt}<|User|>{query}<|Assistant|></think>{response}<|end▁of▁sentence|>...<|User|>{query}<|Assistant|></think>{response}<|end▁of▁sentence|>`
144
-
145
- Prefix:
146
- `<|User|>{query}<|Assistant|></think>`
147
-
148
- By concatenating the context and the prefix, we obtain the correct prompt for the query.
149
-
150
- ### Thinking
151
-
152
- #### First-Turn
153
- Prefix:
154
- `<|begin▁of▁sentence|>{system prompt}<|User|>{query}<|Assistant|><think>`
155
-
156
- The prefix of thinking mode is similar to DeepSeek-R1.
157
-
158
-
159
- #### Multi-Turn
160
- Context:
161
- `<|begin▁of▁sentence|>{system prompt}<|User|>{query}<|Assistant|></think>{response}<|end▁of▁sentence|>...<|User|>{query}<|Assistant|></think>{response}<|end▁of▁sentence|>`
162
-
163
- Prefix:
164
- `<|User|>{query}<|Assistant|><think>`
165
-
166
- The multi-turn template is the same with non-thinking multi-turn chat template. It means the thinking token in the last turn will be dropped but the `</think>` is retained in every turn of context.
167
-
168
- ### ToolCall
169
- Toolcall is supported in non-thinking mode. The format is:
170
-
171
- `<|begin▁of▁sentence|>{system prompt}{tool_description}<|User|>{query}<|Assistant|></think>` where the tool_description is
172
-
173
- ```
174
- ## Tools
175
- You have access to the following tools:
176
-
177
- ### {tool_name1}
178
- Description: {description}
179
-
180
- Parameters: {json.dumps(parameters)}
181
-
182
- IMPORTANT: ALWAYS adhere to this exact format for tool use:
183
- <|tool▁calls▁begin|><|tool▁call▁begin|>tool_call_name<|tool▁sep|>tool_call_arguments<|tool▁call▁end|>{{additional_tool_calls}}<|tool▁calls▁end|>
184
-
185
- Where:
186
- - `tool_call_name` must be an exact match to one of the available tools
187
- - `tool_call_arguments` must be valid JSON that strictly follows the tool's Parameters Schema
188
- - For multiple tool calls, chain them directly without separators or spaces
189
- ```
190
-
191
- ### Code-Agent
192
- We support various code agent frameworks. Please refer to the above toolcall format to create your own code agents. An example is shown in `assets/code_agent_trajectory.html`.
193
-
194
- ### Search-Agent
195
- We design a specific format for searching toolcall in thinking mode, to support search agent.
196
-
197
- For complex questions that require accessing external or up-to-date information, DeepSeek-V3.1 can leverage a user-provided search tool through a multi-turn tool-calling process.
198
-
199
- Please refer to the `assets/search_tool_trajectory.html` and `assets/search_python_tool_trajectory.html` for the detailed template.
200
-
201
- ## Evaluation
202
- | Category | Benchmark (Metric) | DeepSeek V3.1-NonThinking | DeepSeek V3 0324 | DeepSeek V3.1-Thinking | DeepSeek R1 0528
203
- |----------|----------------------------------|-----------------|---|---|---|
204
- | General |
205
- | | MMLU-Redux (EM) | 91.8 | 90.5 | 93.7 | 93.4
206
- | | MMLU-Pro (EM) | 83.7 | 81.2 | 84.8 | 85.0
207
- | | GPQA-Diamond (Pass@1) | 74.9 | 68.4 | 80.1 | 81.0
208
- | | Humanity's Last Exam (Pass@1) | - | - | 15.9 | 17.7
209
- |Search Agent|
210
- | | BrowseComp | - | - | 30.0 | 8.9
211
- | | BrowseComp_zh | - | - | 49.2 | 35.7
212
- | | Humanity's Last Exam (Python + Search) |- | - | 29.8 | 24.8
213
- | | SimpleQA | - | - | 93.4 | 92.3
214
- | Code |
215
- | | LiveCodeBench (2408-2505) (Pass@1) | 56.4 | 43.0 | 74.8 | 73.3
216
- | | Codeforces-Div1 (Rating) | - | - | 2091 | 1930
217
- | | Aider-Polyglot (Acc.) | 68.4 | 55.1 | 76.3 | 71.6
218
- | Code Agent|
219
- | | SWE Verified (Agent mode) | 66.0 | 45.4 | - | 44.6
220
- | | SWE-bench Multilingual (Agent mode) | 54.5 | 29.3 | - | 30.5
221
- | | Terminal-bench (Terminus 1 framework) | 31.3 | 13.3 | - | 5.7
222
- | Math |
223
- | | AIME 2024 (Pass@1) | 66.3 | 59.4 | 93.1 | 91.4
224
- | | AIME 2025 (Pass@1) | 49.8 | 51.3 | 88.4 | 87.5
225
- | | HMMT 2025 (Pass@1) | 33.5 | 29.2 | 84.2 | 79.4 |
226
-
227
- Note:
228
- - Search agents are evaluated with our internal search framework, which uses a commercial search API + webpage filter + 128K context window. Seach agent results of R1-0528 are evaluated with a pre-defined workflow.
229
-
230
- - SWE-bench is evaluated with our internal code agent framework.
231
-
232
- - HLE is evaluated with the text-only subset.
233
-
234
- ### Usage Example
235
-
236
- ```python
237
- import transformers
238
-
239
- tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.1")
240
-
241
- messages = [
242
- {"role": "system", "content": "You are a helpful assistant"},
243
- {"role": "user", "content": "Who are you?"},
244
- {"role": "assistant", "content": "<think>Hmm</think>I am DeepSeek"},
245
- {"role": "user", "content": "1+1=?"}
246
- ]
247
-
248
- tokenizer.apply_chat_template(messages, tokenize=False, thinking=True, add_generation_prompt=True)
249
- # '<|begin▁of▁sentence|>You are a helpful assistant<|User|>Who are you?<|Assistant|></think>I am DeepSeek<|end▁of▁sentence|><|User|>1+1=?<|Assistant|><think>'
250
-
251
- tokenizer.apply_chat_template(messages, tokenize=False, thinking=False, add_generation_prompt=True)
252
- # '<|begin▁of▁sentence|>You are a helpful assistant<|User|>Who are you?<|Assistant|></think>I am DeepSeek<|end▁of▁sentence|><|User|>1+1=?<|Assistant|></think>'
253
- ```
254
-
255
- ## How to Run Locally
256
-
257
- The model structure of DeepSeek-V3.1 is the same as DeepSeek-V3. Please visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running this model locally.
258
-
259
- ## License
260
-
261
- This repository and the model weights are licensed under the [MIT License](LICENSE).
262
-
263
- ## Citation
264
-
265
- ```
266
- @misc{deepseekai2024deepseekv3technicalreport,
267
- title={DeepSeek-V3 Technical Report},
268
- author={DeepSeek-AI},
269
- year={2024},
270
- eprint={2412.19437},
271
- archivePrefix={arXiv},
272
- primaryClass={cs.CL},
273
- url={https://arxiv.org/abs/2412.19437},
274
- }
275
- ```
276
-
277
- ## Contact
278
-
279
- If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).