papers read abs - a exhyy Collection

exhyy 's Collections

papers read abs

papers read abs

updated Dec 17, 2023

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 20

Note 提出了一个数据集用于评估VLM对于image-text pair的理解程度。该数据集由图片和caption组成，其中包含图片不同region的子caption。