HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation Paper • 2509.23736 • Published Sep 28 • 1
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation Paper • 2510.24821 • Published 4 days ago • 27
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation Paper • 2510.24821 • Published 4 days ago • 27
HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation Paper • 2509.23736 • Published Sep 28 • 1
Zippo: Zipping Color and Transparency Distributions into a Single Diffusion Model Paper • 2403.11077 • Published Mar 17, 2024
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction Paper • 2505.02471 • Published May 5 • 15
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO Paper • 2505.21457 • Published May 27 • 15
Ming-Omni: A Unified Multimodal Model for Perception and Generation Paper • 2506.09344 • Published Jun 11 • 28
GUI-Shepherd: Reliable Process Reward and Verification for Long-Sequence GUI Tasks Paper • 2509.23738 • Published Sep 28 • 1
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer Paper • 2510.06590 • Published 24 days ago • 70
End-to-End Human Object Interaction Detection with HOI Transformer Paper • 2103.04503 • Published Mar 8, 2021
Improving Human-Object Interaction Detection via Phrase Learning and Label Composition Paper • 2112.07383 • Published Dec 14, 2021
Solutions for Fine-grained and Long-tailed Snake Species Recognition in SnakeCLEF 2022 Paper • 2207.01216 • Published Jul 4, 2022
DC-Former: Diverse and Compact Transformer for Person Re-Identification Paper • 2302.14335 • Published Feb 28, 2023
StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models Paper • 2409.02543 • Published Sep 4, 2024
Video Virtual Try-on with Conditional Diffusion Transformer Inpainter Paper • 2506.21270 • Published Jun 26