VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published 30 days ago • 111
ViCrop: Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal Large Language Models Paper • 2310.16033 • Published Oct 24, 2023
Exploring Perceptual Limitation of Multimodal Large Language Models Paper • 2402.07384 • Published Feb 12, 2024 • 1
PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales Paper • 2211.01562 • Published Nov 3, 2022 • 1
COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes Paper • 2409.04053 • Published Sep 6, 2024 • 1
MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning Paper • 2404.13591 • Published Apr 21, 2024 • 2