WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation Paper โข 2511.11434 โข Published 11 days ago โข 43
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper โข 2511.02778 โข Published 21 days ago โข 100
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback Paper โข 2511.01678 โข Published 22 days ago โข 34
From Charts to Code: A Hierarchical Benchmark for Multimodal Models Paper โข 2510.17932 โข Published Oct 20 โข 7
Paper2Video: Automatic Video Generation from Scientific Papers Paper โข 2510.05096 โข Published Oct 6 โข 113
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Paper โข 2504.06148 โข Published Apr 8 โข 13
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models Paper โข 2503.20198 โข Published Mar 26 โข 4
Automated Movie Generation via Multi-Agent CoT Planning Paper โข 2503.07314 โข Published Mar 10 โข 44
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles Paper โข 2503.03651 โข Published Mar 5 โข 16
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models Paper โข 2503.01774 โข Published Mar 3 โข 44
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Paper โข 2502.14397 โข Published Feb 20 โข 41
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation Paper โข 2502.08047 โข Published Feb 12 โข 28
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper โข 2502.07870 โข Published Feb 11 โข 46