File size: 3,864 Bytes
1e232c9
 
 
 
 
 
 
 
 
00955cb
6a2acd5
beb7c63
 
6a2acd5
5a9c8d4
beb7c63
6a2acd5
00955cb
b6c5e57
00955cb
 
f4fe971
beb7c63
 
00955cb
39fa490
13f40b0
beb7c63
00955cb
 
13f40b0
00955cb
13f40b0
00955cb
 
 
beb7c63
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
title: README
emoji: 🏃
colorFrom: yellow
colorTo: indigo
sdk: static
pinned: false
---

# MeissonFlow Research [[Join us]](mailto:[email protected]) [[Fund us]](mailto:[email protected])

**MeissonFlow Research** is a non-commercial research group dedicated to advancing generative modeling techniques for structured visual and multimodal content creation. 
We aim to design models and algorithms that help creators produce high-quality content with greater efficiency and control.

Our journey began with [**MaskGIT**](https://arxiv.org/abs/2202.04200), a pioneering work by [**Huiwen Chang**](https://scholar.google.com/citations?hl=en&user=eZQNcvcAAAAJ), which introduced a bidirectional transformer decoder for image synthesis and outperformed traditional raster-scan autoregressive (AR) generation.
This paradigm was later extended to text-to-image synthesis in [**MUSE**](https://arxiv.org/abs/2301.00704).

Building upon these foundations, we scaled masked generative modeling with the latest architectural designs and sampling strategies, culminating in [**Monetico** and **Meissonic**](https://github.com/viiika/Meissonic) built from scratch, which are on par with leading diffusion models such as SDXL while maintaining greater efficiency.

Having verified the effectiveness of this approach, we began to ask a deeper question, one that reaches beyond performance benchmarks: **what foundations are required for general-purpose generative intelligence**?  
Through discussions with researchers at Safe Superintelligence (SSI) Club, University of Illinois Urbana-Champaign (UIUC) and Riot Video Games, we converged on the vision of a **visual-centric world model**: a generative and interactive system capable of simulating, interacting with, and reasoning about multimodal environments.

> We believe that **masking** is a fundamental abstraction for building such controllable, efficient, and generalizable intelligence.

A similar vision was shared by [**Stefano Ermon**](https://cs.stanford.edu/~ermon/) at ICLR 2025, where he described *Diffusion as a unified paradigm for a multi-modal world model*, a message that echoes and strengthens our belief: that unified generative modeling is the path toward general-purpose superintelligence.

To pursue this vision, we introduced [**Muddit** and **Muddit Plus**](https://github.com/M-E-AGI-Lab/Muddit), unified generative models built upon visual priors (Meissonic), and capable of generation across text and image within a single architecture and paradigm.

We want to build the world with visual prior, though we sadly agree that the language prior dominates current unified models. 
Inspired by the success of Mercury by [**Inception Labs**](https://www.inceptionlabs.ai/),
we developed [**Lumina-DiMOO**](https://arxiv.org/abs/2510.06308). As a larger scale unified masked diffusion model than Muddit, Lumina-DiMOO achieves state-of-the-art performance among discrete diffusion models to date; and we are still pushing it further! It integrates high-resolution image generation with multimodal capabilities, including text-to-image, image-to-image, and image understanding.

To further clarify our long-term roadmap, we articulated our perspective in [**From Masks to Worlds: A Hitchhiker’s Guide to World Models**](https://arxiv.org/abs/2510.20668), which traces a five-stage roadmap from early masked modeling to unified generative modeling and the future we are building.

We look forward to releasing more models and algorithms in this direction. We post related and family papers [here](https://github.com/viiika/Meissonic). 
We thank our amazing teammates and you, the reader, for your interest in our work.

Special thanks to [**Style2Paints Research**](https://lllyasviel.github.io/Style2PaintsResearch/), which helped shape our taste and research direction in the early days.