metadata
license: apache-2.0
X-Omni-En (support English text rendering)
π Project Page | π Paper | π»β Code | π HuggingFace Space
π Highlights
- Unified Modeling Approach: A discrete autoregressive model handling image and language modalities.
- Superior Instruction Following: Exceptional capability to follow complex instructions.
- Superior Text Rendering: Accurately render text in English.
- Arbitrary resolutions: Produces aesthetically pleasing images at arbitrary resolutions.
π Citation
If you find this project helpful for your research or use it in your own work, please cite our paper:
@article{geng2025xomni,
author = {Zigang Geng, Yibing Wang, Yeyao Ma, Chen Li, Yongming Rao, Shuyang Gu, Zhao Zhong, Qinglin Lu, Han Hu, Xiaosong Zhang, Linus, Di Wang and Jie Jiang},
title = {X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again},
journal = {CoRR},
volume = {abs/2507.22058},
year = {2025},
}