RefalMachine
/

DeepSeek-V3-0324-Channel-INT8

Text Generation

text-generation-inference

8-bit precision

Model card Files Files and versions

RefalMachine commited on Apr 12

Commit

f32412f

·

verified ·

1 Parent(s): 658c8fb

Update README.md

Files changed (1) hide show

README.md +18 -3

README.md CHANGED Viewed

@@ -10,11 +10,26 @@ base_model:
 - deepseek-ai/DeepSeek-V3-0324
 ---
-This is the BF16 model of DeekSeek V3-0324. Useful for quantization and inference on GPUs that do not support FP8 (Nvidia Ampere)
-BF16 is result of dequantizing the FP8 quantized weights from DeepSeek AI: https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
-[GPTQModel](https://github.com/modelcloud/gptqmodel) is your go-to choice for DeepSeek V3-0324 quantization toolkit for inference on vLLM and SGLang
 # DeepSeek-V3-0324

 - deepseek-ai/DeepSeek-V3-0324
 ---
+# Channel-wise INT8 DeepSeek-V3-0324
+The INT8 quant for SGLang (https://github.com/sgl-project/sglang)
+[PULL REQUEST](https://github.com/sgl-project/sglang/pull/3888)
+## 1. Quantization Process
+We apply INT8 quantization to the BF16 checkpoints.
+The quantization scales are determined by dividing the channnel-wise maximum of element values by the INT8 type maximum.
+To generate this weight, run the provided script in the ``./inference`` directory:
+``
+python3 bf16_cast_channel_int8.py --input-bf16-hf-path /path/to/bf16-weights/ --output-int8-hf-path /path/to/save-int8-weight/
+``
+## 2. Trouble Shooting
+Before inference, you should confirm that there is no attribute "quantization_config" in `config.json`.
+---
 # DeepSeek-V3-0324