Qwen
/

Text Generation
Transformers
Safetensors
qwen3_next
conversational

vllm: error: unrecognized arguments: --enable-reasoning

#7
by evilll - opened

VLLM_USE_MODELSCOPE=true VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve /home/user/model_zoo/Qwen3-Next-80B-A3B-Thinking --port 8000 --tensor-parallel-size 4 --max-model-len 262144 --enable-reasoning --reasoning-parser deepseek_r1
INFO 09-12 11:12:03 [init.py:216] Automatically detected platform cuda.
usage: vllm [-h] [-v] {chat,complete,serve,bench,collect-env,run-batch} ...
vllm: error: unrecognized arguments: --enable-reasoning

(qwen_next) user@share_dev_:$ /home/user/miniconda3/envs/qwen_next/bin/vllm --version
INFO 09-12 11:14:34 [init.py:216] Automatically detected platform cuda.
0.10.2rc3.dev1+g12a8414d8.cu129

The '--enable-reasoning' argument argument was removed from vllm a few versions ago. You just need the --reasoning-parser .

I believe the correct settings are;

--enable-auto-tool-choice --tool-call-parser qwen3_coder --reasoning-parser qwen3

Sign up or log in to comment