How much vram is needed to run this model? 8xRTX3090=192GB isn't enough to run the context.

#12

by kq - opened 24 days ago

24 days ago

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 && vllm serve /models/tclf90/Qwen3-VL-235B-A22B-Thinking-AWQ --enable-expert-parallel --api-key token-deaf --port 12303 --gpu-memory-utilization 0.98 --max-num-seqs 16 --max-model-len 131072 --tensor-parallel-size 8 --enable-auto-tool-choice --tool-call-parser hermes --served-model-name qwen3-vl-thinking

Minimal VLLM Failure Log
Pre-Failure Process
Compilation Time: Workers spend approximately 37 seconds on torch.compile (Dynamo bytecode transform) and then load the compiled graphs in ~13 seconds.
Note: The main process logged a warning about processes hanging/compiling for 60 seconds.
KV Cache Allocation Check: Before the crash, the workers report the available KV cache memory:
Available KV Cache Memory per Worker: 2.55 GiB
Critical Failure (VRAM Insufficiency) 🛑
Error Reason: EngineCore failed to start.
Root Cause: ValueError: To serve at least one request with the models's max seq len (131072), (5.88 GiB KV cache is needed, which is larger than the available KV cache memory (2.55 GiB).
Diagnosis: The engine attempted to initialize the KV cache but found the required memory (5.88 GiB per worker) was more than double the available memory (2.55 GiB per worker).
Recommended Action:
Increase gpu_memory_utilization (if possible, though it was already at 0.98).
Decrease max_model_len.
Estimated Max Length: Based on the available memory, the estimated maximum model length is 56800 (down from the requested 131072).
Final Status: The API server received a RuntimeError: Engine core initialization failed and the process shut down.

kq changed discussion title from how much vram is needed to run this model? 8xRTX3090=192GB isn't enough to run the context. to How much vram is needed to run this model? 8xRTX3090=192GB isn't enough to run the context. 24 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment