DongHyunKim commited on
Commit
d61f4cc
·
verified ·
1 Parent(s): 99d5983

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -1
README.md CHANGED
@@ -292,7 +292,52 @@ To speed up your inference, you can use the vLLM engine from [our repository](ht
292
  Make sure to switch to the `v0.9.2rc2_hyperclovax_vision_seed` branch.
293
 
294
  **Launch API server**:
295
- - https://oss.navercorp.com/HYPERSCALE-AI-VISION/vllm/blob/main/README.md
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
296
 
297
  **Request Example**:
298
  - https://github.com/vllm-project/vllm/pull/20931#issue-3229161410
 
292
  Make sure to switch to the `v0.9.2rc2_hyperclovax_vision_seed` branch.
293
 
294
  **Launch API server**:
295
+ ```
296
+ pyenv virtualenv 3.10.2 .vllm
297
+ pyenv activate .vllm
298
+ sudo apt-get install -y kmod
299
+ pip install --upgrade setuptools wheel pip
300
+ pip install setuptools_scm
301
+
302
+ # install latest commit (e.g. v0.9.0)
303
+ VLLM_USE_PRECOMPILED=1 pip install -e .[serve] --cache-dir=/mnt/tmp
304
+ pip install -U pynvml
305
+ pip install timm av decord
306
+
307
+ # or install previous commit (e.g. v0.8.4)
308
+ pip install -r ./requirements/build.txt
309
+ pip install -r ./requirements/common.txt
310
+ pip install -r ./requirements/cuda.txt
311
+ pip install flash_attn==2.7.4.post1
312
+ pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/
313
+ export VLLM_COMMIT=dc1b4a6f1300003ae27f033afbdff5e2683721ce
314
+ export VLLM_PRECOMPILED_WHEEL_LOCATION=https://wheels.vllm.ai/${VLLM_COMMIT}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
315
+ VLLM_USE_PRECOMPILED=1 pip install -e .[serve] --cache-dir=/mnt/tmp
316
+ pip install -U pynvml
317
+ pip install timm av decord
318
+
319
+ # Then launch api
320
+ MODEL="/mnt/cmlssd004/public/donghyun/HCX_models/hcx-instruct/HyperCLOVAX-Seed-Vision-3B_250610"
321
+ export ATTENTION_BACKEND=FLASH_ATTN_VLLM_V1
322
+ VLLM_USE_V1=1 VLLM_ATTENTION_BACKEND=${ATTENTION_BACKEND} CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.openai.api_server \
323
+ --seed 20250525 \
324
+ --port ${NSML_PORT2} \
325
+ --allowed-local-media-path "/mnt/ocr-nfsx1/public/hodong.lee/cloned/vLLM/v0.8.4/vllm/ipynbs" \
326
+ --max-model-len 8192 \
327
+ --max-num-batched-tokens 8192 \
328
+ --max-num-seqs 128 \
329
+ --max-parallel-loading-workers 128 \
330
+ --limit-mm-per-prompt.image="32" \
331
+ --limit-mm-per-prompt.viedo="32" \
332
+ --max-num-frames 256 \
333
+ --tensor-parallel-size 1 \
334
+ --data-parallel-size 1 \
335
+ --model ${MODEL} \
336
+ --dtype float16 \
337
+ --trust-remote-code \
338
+ --chat-template-content-format "openai" \
339
+ --download-dir "/mnt/ocr-nfsx1/public_datasets/.cache/huggingface/hub"
340
+ ```
341
 
342
  **Request Example**:
343
  - https://github.com/vllm-project/vllm/pull/20931#issue-3229161410