Update README.md
Browse files
README.md
CHANGED
|
@@ -17,6 +17,11 @@ Base LLM: [lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5)
|
|
| 17 |
The model can generate interleaving images and videos, despite the absence of image-video pairs in the dataset. Video-LLaVa is uses an encoder trained for unified visual representation through alignment prior to projection.
|
| 18 |
Extensive experiments demonstrate the complementarity of modalities, showcasing significant superiority when compared to models specifically designed for either images or videos.
|
| 19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
**Paper or resources for more information:**
|
| 21 |
https://github.com/PKU-YuanGroup/Video-LLaVA
|
| 22 |
|
|
|
|
| 17 |
The model can generate interleaving images and videos, despite the absence of image-video pairs in the dataset. Video-LLaVa is uses an encoder trained for unified visual representation through alignment prior to projection.
|
| 18 |
Extensive experiments demonstrate the complementarity of modalities, showcasing significant superiority when compared to models specifically designed for either images or videos.
|
| 19 |
|
| 20 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/videollava_example.png"
|
| 21 |
+
alt="drawing" width="600"/>
|
| 22 |
+
|
| 23 |
+
<small> VideoLLaVa example. Taken from the <a href="https://arxiv.org/abs/2311.10122">original paper.</a> </small>
|
| 24 |
+
|
| 25 |
**Paper or resources for more information:**
|
| 26 |
https://github.com/PKU-YuanGroup/Video-LLaVA
|
| 27 |
|