IDEA-Research
/

Rex-Omni

@@ -1,6 +1,9 @@
 ---
 language:
 - en
 tags:
 - vision
 - object-detection
@@ -10,11 +13,12 @@ tags:
 - visual-prompting
 - open-set-detection
 - object-pointing
-pipeline_tag: image-text-to-text
-base_model:
-- Qwen/Qwen2.5-VL-3B-Instruct
 ---
 <div align=center>
   <img src="assets/logo.png" width=600 >
 </div>
@@ -48,7 +52,12 @@ base_model:
       alt="RexThinker Demo on Hugging Face"
     />
   </a>
 </p>
 </div>
@@ -131,4 +140,28 @@ Rex-Omni is licensed under the [IDEA License 1.0](LICENSE), Copyright (c) IDEA.
 For questions and feedback, please contact us at:
 - Email: [email protected]
-- GitHub Issues: [IDEA-Research/Rex-Omni](https://github.com/IDEA-Research/Rex-Omni/issues)

 ---
+base_model:
+- Qwen/Qwen2.5-VL-3B-Instruct
 language:
 - en
+pipeline_tag: image-text-to-text
 tags:
 - vision
 - object-detection
 - visual-prompting
 - open-set-detection
 - object-pointing
+library_name: transformers
+license: other
 ---
+This model is **Rex-Omni**, a 3B-parameter Multimodal Large Language Model (MLLM) presented in the paper "[Detect Anything via Next Point Prediction](https://huggingface.co/papers/2510.12798)". It is compatible with the Hugging Face `transformers` library and is licensed under the [IDEA License 1.0](https://github.com/IDEA-Research/Rex-Omni/blob/main/LICENSE).
 <div align=center>
   <img src="assets/logo.png" width=600 >
 </div>
       alt="RexThinker Demo on Hugging Face"
     />
   </a>
+  <a href="https://github.com/IDEA-Research/Rex-Omni">
+    <img
+      src="https://img.shields.io/badge/GitHub-Code-blue?logo=github&logoColor=white"
+      alt="GitHub Code"
+    />
+  </a>
 </p>
 </div>
 For questions and feedback, please contact us at:
 - Email: [email protected]
+- GitHub Issues: [IDEA-Research/Rex-Omni](https://github.com/IDEA-Research/Rex-Omni/issues)
+## 7. Citation
+Rex-Omni comes from a series of prior works. If you’re interested, you can take a look.
+- [RexThinker](https://arxiv.org/abs/2506.04034)
+- [RexSeek](https://arxiv.org/abs/2503.08507)
+- [ChatRex](https://arxiv.org/abs/2411.18363)
+- [DINO-X](https://arxiv.org/abs/2411.14347)
+- [Grounidng DINO 1.5](https://arxiv.org/abs/2405.10300)
+- [T-Rex2](https://link.springer.com/chapter/10.1007/978-3-031-73414-4_3)
+- [T-Rex](https://arxiv.org/abs/2311.13596)
+```bibtex
+@misc{jiang2025detectpointprediction,
+      title={Detect Anything via Next Point Prediction},
+      author={Qing Jiang and Junan Huo and Xingyu Chen and Yuda Xiong and Zhaoyang Zeng and Yihao Chen and Tianhe Ren and Junzhi Yu and Lei Zhang},
+      year={2025},
+      eprint={2510.12798},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2510.12798},
+}
+```