Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Paper • 2511.19418 • Published 10 days ago • 26
CoLLM: A Large Language Model for Composed Image Retrieval Paper • 2503.19910 • Published Mar 25 • 15
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper • 2502.01341 • Published Feb 3 • 39