BryanW commited on
Commit
00955cb
·
verified ·
1 Parent(s): 14bf825

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -7
README.md CHANGED
@@ -7,7 +7,7 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- # MeissonFlow Research [[Join us]](mailto:[email protected])
11
 
12
  **MeissonFlow Research** is a non-commercial research group dedicated to advancing generative modeling techniques for structured visual and multimodal content creation.
13
  We aim to design models and algorithms that help creators produce high-quality content with greater efficiency and control.
@@ -15,18 +15,24 @@ We aim to design models and algorithms that help creators produce high-quality c
15
  Our journey began with [**MaskGIT**](https://arxiv.org/abs/2202.04200), a pioneering work by [**Huiwen Chang**](https://scholar.google.com/citations?hl=en&user=eZQNcvcAAAAJ), which introduced a bidirectional transformer decoder for image synthesis—outperforming traditional raster-scan autoregressive (AR) generation.
16
  This paradigm was later extended to text-to-image synthesis in [**MUSE**](https://arxiv.org/abs/2301.00704).
17
 
18
- Building upon these foundations, we scaled masked generative modeling with the latest architectural designs and sampling strategiesculminating in [**Monetico** and **Meissonic**](https://github.com/viiika/Meissonic) from scratch, which on par with leading diffusion models such as SDXL, while maintaining greater efficiency.
19
 
20
- Having verified the effectiveness of this approach, we began to ask a deeper question one that reaches beyond performance benchmarks: **what foundations are required for general-purpose generative intelligence**?
21
- Through discussions with researchers at Safe Superintelligence (SSI) Club, University of Illinois Urbana-Champaign (UIUC) and Riot Video Games, we converged on the vision of a **visual-centric world model** a generative and interactive system capable of simulating, interacting with, and reasoning about multimodal environments.
22
 
23
  > We believe that **masking** is a fundamental abstraction for building such controllable, efficient, and generalizable intelligence.
24
 
25
- A similar vision was shared by [**Stefano Ermon**](https://cs.stanford.edu/~ermon/) at ICLR 2025, where he described *Diffusion as a unified paradigm for a multi-modal world model* a message that echoes and strengthens our belief: that unified generative modeling is the path toward general-purpose superintelligence.
26
 
27
  To pursue this vision, we introduced [**Muddit** and **Muddit Plus**](https://github.com/M-E-AGI-Lab/Muddit), unified generative models built upon visual priors (Meissonic), and capable of unified generation across text and image within a single architecture and paradigm.
28
 
29
- We look forward to releasing more models and algorithms in this direction.
30
- We thank our amazing teammates — and you, the reader for your interest in our work.
 
 
 
 
 
 
31
 
32
  Special thanks to [**Style2Paints Research**](https://lllyasviel.github.io/Style2PaintsResearch/), which helped shape our taste and research direction in the early days.
 
7
  pinned: false
8
  ---
9
 
10
+ # MeissonFlow Research [[Join us]](mailto:[email protected]) [[Fund us]](mailto:[email protected])
11
 
12
  **MeissonFlow Research** is a non-commercial research group dedicated to advancing generative modeling techniques for structured visual and multimodal content creation.
13
  We aim to design models and algorithms that help creators produce high-quality content with greater efficiency and control.
 
15
  Our journey began with [**MaskGIT**](https://arxiv.org/abs/2202.04200), a pioneering work by [**Huiwen Chang**](https://scholar.google.com/citations?hl=en&user=eZQNcvcAAAAJ), which introduced a bidirectional transformer decoder for image synthesis—outperforming traditional raster-scan autoregressive (AR) generation.
16
  This paradigm was later extended to text-to-image synthesis in [**MUSE**](https://arxiv.org/abs/2301.00704).
17
 
18
+ Building upon these foundations, we scaled masked generative modeling with the latest architectural designs and sampling strategies, culminating in [**Monetico** and **Meissonic**](https://github.com/viiika/Meissonic) built from scratch, which are on par with leading diffusion models such as SDXL while maintaining greater efficiency.
19
 
20
+ Having verified the effectiveness of this approach, we began to ask a deeper question, one that reaches beyond performance benchmarks: **what foundations are required for general-purpose generative intelligence**?
21
+ Through discussions with researchers at Safe Superintelligence (SSI) Club, University of Illinois Urbana-Champaign (UIUC) and Riot Video Games, we converged on the vision of a **visual-centric world model**: a generative and interactive system capable of simulating, interacting with, and reasoning about multimodal environments.
22
 
23
  > We believe that **masking** is a fundamental abstraction for building such controllable, efficient, and generalizable intelligence.
24
 
25
+ A similar vision was shared by [**Stefano Ermon**](https://cs.stanford.edu/~ermon/) at ICLR 2025, where he described *Diffusion as a unified paradigm for a multi-modal world model*, a message that echoes and strengthens our belief: that unified generative modeling is the path toward general-purpose superintelligence.
26
 
27
  To pursue this vision, we introduced [**Muddit** and **Muddit Plus**](https://github.com/M-E-AGI-Lab/Muddit), unified generative models built upon visual priors (Meissonic), and capable of unified generation across text and image within a single architecture and paradigm.
28
 
29
+ We want to build the world with visual prior, though we sadly agree that the language prior dominates current unified models.
30
+ Inspired by the success of Mercury by [**Inception Labs**](https://www.inceptionlabs.ai/),
31
+ we developed [**Lumina-DiMOO**](https://arxiv.org/abs/2510.06308). At a larger scale than Muddit, Lumina-DiMOO is a unified masked diffusion model that achieves state-of-the-art performance among discrete diffusion models to date, and we are still pushing it further! It integrates high-resolution image generation with multimodal capabilities, including text-to-image, image-to-image, and image understanding.
32
+
33
+ To further clarify our roadmap, we articulated our perspective in [**From Masks to Worlds: A Hitchhiker’s Guide to World Models**](https://arxiv.org/abs/2510.20668), which traces a five-stage roadmap from early masked modeling to unified generative modeling and the future we are building.
34
+
35
+ We look forward to releasing more models and algorithms in this direction. We post related and family papers [here](https://github.com/viiika/Meissonic).
36
+ We thank our amazing teammates and you, the reader, for your interest in our work.
37
 
38
  Special thanks to [**Style2Paints Research**](https://lllyasviel.github.io/Style2PaintsResearch/), which helped shape our taste and research direction in the early days.