MARS: Enabling Autoregressive Models Multi-Token Generation Paper • 2604.07023 • Published 6 days ago • 35
Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning Paper • 2604.04746 • Published 6 days ago • 67
Query-Kontext: An Unified Multimodal Model for Image Generation and Editing Paper • 2509.26641 • Published Sep 30, 2025 • 4
Gated Condition Injection without Multimodal Attention: Towards Controllable Linear-Attention Transformers Paper • 2603.27666 • Published 16 days ago • 18
Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation Paper • 2604.03118 • Published 11 days ago • 6
SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing Paper • 2604.04911 • Published 8 days ago • 35
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published 8 days ago • 105
Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision Paper • 2604.04934 • Published 8 days ago • 42
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 12 days ago • 841
ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers Paper • 2603.24414 • Published 20 days ago • 183
Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published 13 days ago • 36
On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers Paper • 2603.28762 • Published 15 days ago • 25
6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models Paper • 2603.18742 • Published 26 days ago • 10
DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing Paper • 2603.28713 • Published 15 days ago • 19