Update README.md
Browse files
README.md
CHANGED
|
@@ -10,11 +10,26 @@ base_model:
|
|
| 10 |
- deepseek-ai/DeepSeek-V3-0324
|
| 11 |
---
|
| 12 |
|
| 13 |
-
|
| 14 |
|
| 15 |
-
|
|
|
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
|
| 20 |
# DeepSeek-V3-0324
|
|
|
|
| 10 |
- deepseek-ai/DeepSeek-V3-0324
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# Channel-wise INT8 DeepSeek-V3-0324
|
| 14 |
|
| 15 |
+
The INT8 quant for SGLang (https://github.com/sgl-project/sglang)
|
| 16 |
+
[PULL REQUEST](https://github.com/sgl-project/sglang/pull/3888)
|
| 17 |
|
| 18 |
+
## 1. Quantization Process
|
| 19 |
+
|
| 20 |
+
We apply INT8 quantization to the BF16 checkpoints.
|
| 21 |
+
|
| 22 |
+
The quantization scales are determined by dividing the channnel-wise maximum of element values by the INT8 type maximum.
|
| 23 |
+
|
| 24 |
+
To generate this weight, run the provided script in the ``./inference`` directory:
|
| 25 |
+
|
| 26 |
+
``
|
| 27 |
+
python3 bf16_cast_channel_int8.py --input-bf16-hf-path /path/to/bf16-weights/ --output-int8-hf-path /path/to/save-int8-weight/
|
| 28 |
+
``
|
| 29 |
+
## 2. Trouble Shooting
|
| 30 |
+
Before inference, you should confirm that there is no attribute "quantization_config" in `config.json`.
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
|
| 34 |
|
| 35 |
# DeepSeek-V3-0324
|