why vivit in huggingface has no factorized encoders and so on?

#4
by tsaganshosg - opened

Dear friends,
i find that the torch implementation here has no "factorized encoder", "factorized self-attention", ...
the implementation of patchifying here is just a simple "Joint Space-Time"
thank you!

I also aware that this implementation just the model 1 in the paper.
How can we get the pretrained weights for model 2-4.
Thank you

Sign up or log in to comment