JunHowie
/

Qwen3-4B-Thinking-2507-GPTQ-Int4

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

JunHowie commited on Sep 4

Commit

0c89011

·

verified ·

1 Parent(s): 05a155f

Upload folder using huggingface_hub

Files changed (4) hide show

README.md +3 -1
config.json +1 -1
model.safetensors +1 -1
quantize_config.json +1 -1

README.md CHANGED Viewed

@@ -17,6 +17,8 @@ base_model_relation: quantized
 Base model: [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)
 <i>This model is quantized to 4-bit with a group size of 128.</i>
 ```
 vllm serve JunHowie/Qwen3-4B-Thinking-2507-GPTQ-Int4
@@ -261,4 +263,4 @@ If you find our work helpful, feel free to give us a cite.
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2505.09388},
 }
-```

 Base model: [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)
 <i>This model is quantized to 4-bit with a group size of 128.</i>
+<br>
+<i>Compared to earlier quantized versions, the new quantized model demonstrates better tokens/s efficiency. This improvement comes from setting desc_act=False in the quantization configuration.</i>
 ```
 vllm serve JunHowie/Qwen3-4B-Thinking-2507-GPTQ-Int4
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2505.09388},
 }
+```

config.json CHANGED Viewed

@@ -58,7 +58,7 @@
   "quantization_config": {
     "bits": 4,
     "checkpoint_format": "gptq",
-    "desc_act": true,
     "group_size": 128,
     "hyb_act": false,
     "lm_head": false,

   "quantization_config": {
     "bits": 4,
     "checkpoint_format": "gptq",
+    "desc_act": false,
     "group_size": 128,
     "hyb_act": false,
     "lm_head": false,

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:96a9996ce410451bbbdb0038efc9cc882c14c7192ec9065020b3d194b9c42e91
 size 2669888648

 version https://git-lfs.github.com/spec/v1
+oid sha256:7fbf018a7cb2568b6ac95cbf54cf629e1ca8923a4722aad8211065f11226de5e
 size 2669888648

quantize_config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "bits": 4,
   "group_size": 128,
-  "desc_act": true,
   "hyb_act": false,
   "sym": true,
   "lm_head": false,

 {
   "bits": 4,
   "group_size": 128,
+  "desc_act": false,
   "hyb_act": false,
   "sym": true,
   "lm_head": false,