--- license: apache-2.0 language: - es - en base_model: - Qwen/Qwen3-Reranker-0.6B pipeline_tag: text-ranking --- # prudant/Qwen3-Reranker-0.6B-seq-cls-W8A8 This is a compressed version of tomaarsen/Qwen3-Reranker-0.6B-seq-cls using llm-compressor with the following scheme: W8A8 ## Serving ``python3 -m vllm.entrypoints.openai.api_server --model 'dolfsai/Qwen3-Reranker-0.6B-seq-cls-vllm-W8A8' --task classify`` **Important**: You MUST read the following guide for correct usage of this model here [Guide](https://github.com/vllm-project/vllm/pull/19260) ## Model Details - **Original Model**: tomaarsen/Qwen3-Reranker-0.6B-seq-cls - **Quantization Method**: GPTQ - **Compression Libraries**: [llm-compressor](https://github.com/vllm-project/llm-compressor) - **Calibration Dataset**: ultrachat_200k (2048 samples) - **Optimized For**: Inference with vLLM - **License**: same as original model