Visual Document Retrieval
Transformers
Safetensors
ColPali
English
pretraining

Cannot load weights with latest transformers. (4.52.4)

#6
by thkim93 - opened

If I follow the script in the ModelCard, the model weight does not load properly in transformers 4.52.4, as shown in the following warning.
Probably because the key value for PaliGemma's lm_head has changed since https://github.com/huggingface/transformers/commit/17742bd9c8852ab35986dcaa3e68415342ae7eef.
Could you please upload it again with the latest transformers? @tonywu71

- This IS NOT expected if you are initializing ColPaliForRetrieval from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of ColPaliForRetrieval were not initialized from the model checkpoint at vidore/colpali-v1.3-hf and are newly initialized: ['vlm.lm_head.weight', 'vlm.model.language_model.embed_tokens.weight', 'vlm.model.language_model.layers.0.input_layernorm.weight', 

.....

'vlm.model.vision_tower.vision_model.post_layernorm.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.

I am facing this problem too, please resolve it I really need the model

Vidore org

Hello, I thought this was resolved with 4.53.1 ? Do you confirm you have transformers greater than this ?

Yes, I’ve confirmed I’m using transformers==4.53.1, but I’m still facing the same loading issue — query embedding shapes are wrong and retrieval is broken. Could you please help me out with this issue. I might be loading it incorrectly as well is the reason I am attaching these screenshots, thank you so much for replying so quickly, is possible please help me out in this situation.

Capture3.PNG

Capture4.PNG

Vidore org

Transformers changed it's structure or VLMs with 4.52... I can reupload the weights.
In the meantime, can't you use the versions with peft adapters (colpali-v1.3 - or better model yet colqwen2-v1.0) ?

Thanks for the suggestion! I actually tried loading vidore/colpali-v1.3 via colpali-engine, but it still ends up resolving to paligemma-3b-pt-448-base, and throws the same unused weights warning (see screenshot). It seems the checkpoint is pointing to the wrong base model or adapter config. (Please let me know if the screenshot is how you just suggested to load it and the issue persists or if I am doing something wrong, sorry a little new to this field.)
Would really appreciate it if you could reupload the fixed version — I’m blocked on query embedding shapes and retrieval accuracy right now 🙏
In the meantime I will tryout the colqwen2-v1.0 model
Capture5.PNG

Vidore org

hey - should be good (from src install colpali-engine) for the adapters, I have a PR open on the transformers repo to fix with the hf model

Vidore org
edited Jul 13

let me know if all good for the adapter verison on your end (after pip install git+https://github.com/illuin-tech/colpali)

Can you give me access to colpali-engine or make it public? I also tried using it generating my own token.
Capture.PNG

Vidore org

Illuin-tech/colpali (name of the repo)

it is working, thank you so much for the help!

I can confirm it is working with your fix. I had the same problem and came here looking for a solution 👍

Sign up or log in to comment