---
license: apache-2.0
language:
- es
- en
base_model:
- Qwen/Qwen3-Reranker-0.6B
pipeline_tag: text-ranking
---

# prudant/Qwen3-Reranker-0.6B-seq-cls-W8A8

This is a compressed version of tomaarsen/Qwen3-Reranker-0.6B-seq-cls using llm-compressor with the following scheme: W8A8

## Serving

``python3 -m vllm.entrypoints.openai.api_server --model 'dolfsai/Qwen3-Reranker-0.6B-seq-cls-vllm-W8A8' --task classify``

**Important**: You MUST read the following guide for correct usage of this model here [Guide](https://github.com/vllm-project/vllm/pull/19260)

## Model Details

- **Original Model**: tomaarsen/Qwen3-Reranker-0.6B-seq-cls
- **Quantization Method**: GPTQ
- **Compression Libraries**: [llm-compressor](https://github.com/vllm-project/llm-compressor)
- **Calibration Dataset**: ultrachat_200k (2048 samples)
- **Optimized For**: Inference with vLLM
- **License**: same as original model