dolfsai
/

Qwen3-Reranker-0.6B-seq-cls-vllm-W8A8

8-bit precision

compressed-tensors

Model card Files Files and versions

prudant commited on Aug 18

Commit

c627f91

·

verified ·

1 Parent(s): 66641f6

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -12,6 +12,12 @@ pipeline_tag: text-ranking
 This is a compressed version of tomaarsen/Qwen3-Reranker-0.6B-seq-cls using llm-compressor with the following scheme: W8A8
 ## Model Details
 - **Original Model**: tomaarsen/Qwen3-Reranker-0.6B-seq-cls

 This is a compressed version of tomaarsen/Qwen3-Reranker-0.6B-seq-cls using llm-compressor with the following scheme: W8A8
+## Serving
+``python3 -m vllm.entrypoints.openai.api_server --download-dir '/data' --model 'dolfsai/Qwen3-Reranker-0.6B-seq-cls-vllm-W8A8' --task classify``
+**Important**: You MUST read the following guide for correct usage of this model here [Guide](https://github.com/vllm-project/vllm/pull/19260)
 ## Model Details
 - **Original Model**: tomaarsen/Qwen3-Reranker-0.6B-seq-cls