Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs Paper • 2510.18876 • Published 7 days ago • 35
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs Paper • 2510.18876 • Published 7 days ago • 35
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs Paper • 2510.18876 • Published 7 days ago • 35
Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning Paper • 2506.03136 • Published Jun 3 • 24
VMoBA: Mixture-of-Block Attention for Video Diffusion Models Paper • 2506.23858 • Published Jun 30 • 31
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models Paper • 2509.06949 • Published Sep 8 • 56
VQGraph: Rethinking Graph Representation Space for Bridging GNNs and MLPs Paper • 2308.02117 • Published Aug 4, 2023
RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models Paper • 2402.12908 • Published Feb 20, 2024 • 10
VideoTetris: Towards Compositional Text-to-Video Generation Paper • 2406.04277 • Published Jun 6, 2024 • 25
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation Paper • 2502.12148 • Published Feb 17 • 17
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening Paper • 2502.12146 • Published Feb 17 • 16
Training-free Diffusion Acceleration with Bottleneck Sampling Paper • 2503.18940 • Published Mar 24 • 12