Update README.md
Browse files
README.md
CHANGED
|
@@ -292,7 +292,52 @@ To speed up your inference, you can use the vLLM engine from [our repository](ht
|
|
| 292 |
Make sure to switch to the `v0.9.2rc2_hyperclovax_vision_seed` branch.
|
| 293 |
|
| 294 |
**Launch API server**:
|
| 295 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 296 |
|
| 297 |
**Request Example**:
|
| 298 |
- https://github.com/vllm-project/vllm/pull/20931#issue-3229161410
|
|
|
|
| 292 |
Make sure to switch to the `v0.9.2rc2_hyperclovax_vision_seed` branch.
|
| 293 |
|
| 294 |
**Launch API server**:
|
| 295 |
+
```
|
| 296 |
+
pyenv virtualenv 3.10.2 .vllm
|
| 297 |
+
pyenv activate .vllm
|
| 298 |
+
sudo apt-get install -y kmod
|
| 299 |
+
pip install --upgrade setuptools wheel pip
|
| 300 |
+
pip install setuptools_scm
|
| 301 |
+
|
| 302 |
+
# install latest commit (e.g. v0.9.0)
|
| 303 |
+
VLLM_USE_PRECOMPILED=1 pip install -e .[serve] --cache-dir=/mnt/tmp
|
| 304 |
+
pip install -U pynvml
|
| 305 |
+
pip install timm av decord
|
| 306 |
+
|
| 307 |
+
# or install previous commit (e.g. v0.8.4)
|
| 308 |
+
pip install -r ./requirements/build.txt
|
| 309 |
+
pip install -r ./requirements/common.txt
|
| 310 |
+
pip install -r ./requirements/cuda.txt
|
| 311 |
+
pip install flash_attn==2.7.4.post1
|
| 312 |
+
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/
|
| 313 |
+
export VLLM_COMMIT=dc1b4a6f1300003ae27f033afbdff5e2683721ce
|
| 314 |
+
export VLLM_PRECOMPILED_WHEEL_LOCATION=https://wheels.vllm.ai/${VLLM_COMMIT}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
|
| 315 |
+
VLLM_USE_PRECOMPILED=1 pip install -e .[serve] --cache-dir=/mnt/tmp
|
| 316 |
+
pip install -U pynvml
|
| 317 |
+
pip install timm av decord
|
| 318 |
+
|
| 319 |
+
# Then launch api
|
| 320 |
+
MODEL="/mnt/cmlssd004/public/donghyun/HCX_models/hcx-instruct/HyperCLOVAX-Seed-Vision-3B_250610"
|
| 321 |
+
export ATTENTION_BACKEND=FLASH_ATTN_VLLM_V1
|
| 322 |
+
VLLM_USE_V1=1 VLLM_ATTENTION_BACKEND=${ATTENTION_BACKEND} CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.openai.api_server \
|
| 323 |
+
--seed 20250525 \
|
| 324 |
+
--port ${NSML_PORT2} \
|
| 325 |
+
--allowed-local-media-path "/mnt/ocr-nfsx1/public/hodong.lee/cloned/vLLM/v0.8.4/vllm/ipynbs" \
|
| 326 |
+
--max-model-len 8192 \
|
| 327 |
+
--max-num-batched-tokens 8192 \
|
| 328 |
+
--max-num-seqs 128 \
|
| 329 |
+
--max-parallel-loading-workers 128 \
|
| 330 |
+
--limit-mm-per-prompt.image="32" \
|
| 331 |
+
--limit-mm-per-prompt.viedo="32" \
|
| 332 |
+
--max-num-frames 256 \
|
| 333 |
+
--tensor-parallel-size 1 \
|
| 334 |
+
--data-parallel-size 1 \
|
| 335 |
+
--model ${MODEL} \
|
| 336 |
+
--dtype float16 \
|
| 337 |
+
--trust-remote-code \
|
| 338 |
+
--chat-template-content-format "openai" \
|
| 339 |
+
--download-dir "/mnt/ocr-nfsx1/public_datasets/.cache/huggingface/hub"
|
| 340 |
+
```
|
| 341 |
|
| 342 |
**Request Example**:
|
| 343 |
- https://github.com/vllm-project/vllm/pull/20931#issue-3229161410
|