Enhance model card: Add pipeline tag, library name, and abstract
Browse filesThis PR improves the model card for DiT360 by adding:
- `pipeline_tag: text-to-image` to enhance discoverability for users looking for text-to-image generation models.
- `library_name: diffusers` to indicate compatibility with the `diffusers` library, enabling the automated "How to use" widget on the model page.
- The paper's abstract, providing a concise summary of the research directly in the model card. The GitHub repository link within the abstract has also been clarified.
These changes will make the model more informative and accessible to the Hugging Face community.
README.md
CHANGED
|
@@ -1,7 +1,9 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
base_model:
|
| 4 |
- black-forest-labs/FLUX.1-dev
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
# DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
|
|
@@ -16,6 +18,9 @@ base_model:
|
|
| 16 |
**DiT360** is a framework for high-quality panoramic image generation, leveraging both **perspective** and **panoramic** data in a hybrid training scheme.
|
| 17 |
It adopts a two-level strategy—**image-level cross-domain guidance** and **token-level hybrid supervision**—to enhance perceptual realism and geometric fidelity.
|
| 18 |
|
|
|
|
|
|
|
|
|
|
| 19 |
## 🔨 Installation
|
| 20 |
|
| 21 |
Clone the repo first:
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
base_model:
|
| 3 |
- black-forest-labs/FLUX.1-dev
|
| 4 |
+
license: mit
|
| 5 |
+
pipeline_tag: text-to-image
|
| 6 |
+
library_name: diffusers
|
| 7 |
---
|
| 8 |
|
| 9 |
# DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
|
|
|
|
| 18 |
**DiT360** is a framework for high-quality panoramic image generation, leveraging both **perspective** and **panoramic** data in a hybrid training scheme.
|
| 19 |
It adopts a two-level strategy—**image-level cross-domain guidance** and **token-level hybrid supervision**—to enhance perceptual realism and geometric fidelity.
|
| 20 |
|
| 21 |
+
## Abstract
|
| 22 |
+
In this work, we propose DiT360, a DiT-based framework that performs hybrid training on perspective and panoramic data for panoramic image generation. For the issues of maintaining geometric fidelity and photorealism in generation quality, we attribute the main reason to the lack of large-scale, high-quality, real-world panoramic data, where such a data-centric view differs from prior methods that focus on model design. Basically, DiT360 has several key modules for inter-domain transformation and intra-domain augmentation, applied at both the pre-VAE image level and the post-VAE token level. At the image level, we incorporate cross-domain knowledge through perspective image guidance and panoramic refinement, which enhance perceptual quality while regularizing diversity and photorealism. At the token level, hybrid supervision is applied across multiple modules, which include circular padding for boundary continuity, yaw loss for rotational robustness, and cube loss for distortion awareness. Extensive experiments on text-to-panorama, inpainting, and outpainting tasks demonstrate that our method achieves better boundary consistency and image fidelity across eleven quantitative metrics. Our code is available at: [https://github.com/Insta360-Research-Team/DiT360](https://github.com/Insta360-Research-Team/DiT360).
|
| 23 |
+
|
| 24 |
## 🔨 Installation
|
| 25 |
|
| 26 |
Clone the repo first:
|