StoryMem: Multi-shot Long Video Storytelling with Memory Paper • 2512.19539 • Published 6 days ago • 16
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published Nov 24 • 30
Video-As-Prompt: Unified Semantic Control for Video Generation Paper • 2510.20888 • Published Oct 23 • 45
MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation Paper • 2510.18692 • Published Oct 21 • 40
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling Paper • 2510.09212 • Published Oct 10 • 17
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9 • 125
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published Oct 13 • 165
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer Paper • 2508.10893 • Published Aug 14 • 31
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels Paper • 2507.21809 • Published Jul 29 • 136
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again Paper • 2507.22058 • Published Jul 29 • 39
Epona: Autoregressive Diffusion World Model for Autonomous Driving Paper • 2506.24113 • Published Jun 30 • 1
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12, 2024 • 39
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing Paper • 2312.07409 • Published Dec 12, 2023 • 23
CogAgent: A Visual Language Model for GUI Agents Paper • 2312.08914 • Published Dec 14, 2023 • 31