Video prefix incorrect : Frame{i} vs Frame-{i}

#6
by apurvagup - opened

In github repo, Frame-{i} is used during finetune / training. In huggingface example and vllm implementation Frame{i} is used (without the dash).

Github : https://github.com/OpenGVLab/InternVL/blob/2410d1dbf208f0e799459aff9376e5747dbf41a2/internvl_chat_gpt_oss/internvl/train/internvl_chat_finetune.py#L565

VLLM : https://github.com/vllm-project/vllm/blob/01413e0cf5a04da4049ffa38b6ff3df27ccabd06/vllm/model_executor/models/internvl.py#L673

HF : https://huggingface.co/OpenGVLab/InternVL3_5-8B#inference-with-transformers (Under the video section)

I assume the github one is correct because its used for training and vllm and hf are incorrect ? If you can confirm that is the case, I can raise PRs to correct it in vllm and hf.

@Weiyun1025

Sign up or log in to comment