Wrong output

by bullerwins - opened Oct 28, 2025

Discussion

bullerwins

Oct 28, 2025

•

edited Oct 28, 2025

The model doesn't output anything, but it seems to be generating tokens in vllm.

Launch command:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6 VLLM_PP_LAYER_PARTITION=8,6,23,6,6,6,7 vllm serve \
                                                                                /mnt/llms/models/QuantTrio/MiniMax-M2-AWQ/ \
                                                                                --served-model-name MiniMax-M2-AWQ \
                                                                                --enable-auto-tool-choice \
                                                                                --tool-call-parser minimax_m2 \
                                                                                --reasoning-parser minimax_m2_append_think \
                                                                                --swap-space 16 \
                                                                                --max-num-seqs 32 \
                                                                                --max-model-len 32000 \
                                                                                --gpu-memory-utilization 0.9 \
                                                                                --tensor-parallel-size 1 -pp 7 \
                                                                                --enable-expert-parallel \
                                                                                --trust-remote-code \
                                                                                --disable-log-requests \
                                                                                --host 0.0.0.0 \
                                                                                --port 5000

System:
CUDA0=5090
CUDA1=3090
CUDA2=rtx6000
CUDA3=3090
CUDA4=3090
CUDA5=3090
CUDA6=5090

curl http://192.168.10.115:5000/v1/chat/completions \
                                                                         -H "Content-Type: application/json" \
                                                                         -d '{
                                                                       "model": "MiniMax-M2-AWQ",
                                                                       "messages": [
                                                                         {"role": "system", "content": "You are a helpful assistant."},
                                                                         {"role": "user", "content": "Summarize the benefits of distributed training."}
                                                                       ],
                                                                       "max_tokens": 300,
                                                                       "temperature": 0.7
                                                                     }'

Response:

{"id":"chatcmpl-182ff17cc60b4e1fa9269406320996b3","object":"chat.completion","created":1761677032,"model":"MiniMax-M2-AWQ","choices":[{"index":0,"message":{"role":"assistant","content":"<think>\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":30,"total_tokens":330,"completion_tokens":300,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

tclf90

QuantTrio org Oct 29, 2025

Mixing Blackwell and Ada in one run is pretty tough—vLLM’s Blackwell support is still maturing. At this point, the most direct path is to report it upstream to the vLLM team.

bullerwins

Oct 29, 2025

Mixing Blackwell and Ada in one run is pretty tough—vLLM’s Blackwell support is still maturing. At this point, the most direct path is to report it upstream to the vLLM team.

Your quant for GLM-4.6 worked perfectly though with this same setup, so I was wondering if it was something with this model

JunHowie

QuantTrio org Nov 3, 2025

Due to my oversight, I have now completed the missing files and fully validated them on a 2xA100 setup.

Upload model-00018-of-00041.safetensors
Upload model-00019-of-00041.safetensors
Upload model-00021-of-00041.safetensors
Upload model-00023-of-00041.safetensors
Upload model-00025-of-00041.safetensors
Upload model-00027-of-00041.safetensors
Upload model-00030-of-00041.safetensors
Upload model-00035-of-00041.safetensors

You can try again now with this repo.

bullerwins

Nov 3, 2025

work perfectly now. thanks

bullerwins changed discussion status to closed Nov 3, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment