nielsr HF Staff commited on
Commit
97c85b1
·
verified ·
1 Parent(s): 018e402

Add pipeline tag and paper information

Browse files

This PR improves the model card by adding the `pipeline_tag`, ensuring the model is correctly classified. It also adds a link to the original paper and its abstract for more information.

Files changed (1) hide show
  1. README.md +13 -6
README.md CHANGED
@@ -1,16 +1,16 @@
1
  ---
2
- tags:
3
- - text-to-image
4
- - stable-diffusion
5
- license: apache-2.0
6
  language:
7
  - en
8
  library_name: diffusers
 
 
 
 
 
9
  ---
10
 
11
  # IP-Adapter Model Card
12
 
13
-
14
  <div align="center">
15
 
16
  [**Project Page**](https://ip-adapter.github.io) **|** [**Paper (ArXiv)**](https://arxiv.org/abs/2308.06721) **|** [**Code**](https://github.com/tencent-ailab/IP-Adapter)
@@ -18,7 +18,6 @@ library_name: diffusers
18
 
19
  ---
20
 
21
-
22
  ## Introduction
23
 
24
  we present IP-Adapter, an effective and lightweight adapter to achieve image prompt capability for the pre-trained text-to-image diffusion models. An IP-Adapter with only 22M parameters can achieve comparable or even better performance to a fine-tuned image prompt model. IP-Adapter can be generalized not only to other custom models fine-tuned from the same base model, but also to controllable generation using existing controllable tools. Moreover, the image prompt can also work well with the text prompt to accomplish multimodal image generation.
@@ -44,3 +43,11 @@ More information can be found [here](https://laion.ai/blog/giant-openclip/)
44
  - [ip-adapter_sdxl_vit-h.bin](https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter_sdxl_vit-h.bin): same as ip-adapter_sdxl, but use OpenCLIP-ViT-H-14
45
  - [ip-adapter-plus_sdxl_vit-h.bin](https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter-plus_sdxl_vit-h.bin): use patch image embeddings from OpenCLIP-ViT-H-14 as condition, closer to the reference image than ip-adapter_xl and ip-adapter_sdxl_vit-h
46
  - [ip-adapter-plus-face_sdxl_vit-h.bin](https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter-plus-face_sdxl_vit-h.bin): same as ip-adapter-plus_sdxl_vit-h, but use cropped face image as condition
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
2
  language:
3
  - en
4
  library_name: diffusers
5
+ license: apache-2.0
6
+ tags:
7
+ - text-to-image
8
+ - stable-diffusion
9
+ pipeline_tag: image-to-image
10
  ---
11
 
12
  # IP-Adapter Model Card
13
 
 
14
  <div align="center">
15
 
16
  [**Project Page**](https://ip-adapter.github.io) **|** [**Paper (ArXiv)**](https://arxiv.org/abs/2308.06721) **|** [**Code**](https://github.com/tencent-ailab/IP-Adapter)
 
18
 
19
  ---
20
 
 
21
  ## Introduction
22
 
23
  we present IP-Adapter, an effective and lightweight adapter to achieve image prompt capability for the pre-trained text-to-image diffusion models. An IP-Adapter with only 22M parameters can achieve comparable or even better performance to a fine-tuned image prompt model. IP-Adapter can be generalized not only to other custom models fine-tuned from the same base model, but also to controllable generation using existing controllable tools. Moreover, the image prompt can also work well with the text prompt to accomplish multimodal image generation.
 
43
  - [ip-adapter_sdxl_vit-h.bin](https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter_sdxl_vit-h.bin): same as ip-adapter_sdxl, but use OpenCLIP-ViT-H-14
44
  - [ip-adapter-plus_sdxl_vit-h.bin](https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter-plus_sdxl_vit-h.bin): use patch image embeddings from OpenCLIP-ViT-H-14 as condition, closer to the reference image than ip-adapter_xl and ip-adapter_sdxl_vit-h
45
  - [ip-adapter-plus-face_sdxl_vit-h.bin](https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter-plus-face_sdxl_vit-h.bin): same as ip-adapter-plus_sdxl_vit-h, but use cropped face image as condition
46
+
47
+ ## Paper
48
+
49
+ The model was presented in the paper [Conceptrol: Concept Control of Zero-shot Personalized Image Generation](https://huggingface.co/papers/2503.06568).
50
+
51
+ ## Abstract
52
+
53
+ Personalized image generation with text-to-image diffusion models generates unseen images based on reference image content. Zero-shot adapter methods such as IP-Adapter and OminiControl are especially interesting because they do not require test-time fine-tuning. However, they struggle to balance preserving personalized content and adherence to the text prompt. We identify a critical design flaw resulting in this performance gap: current adapters inadequately integrate personalization images with the textual descriptions. The generated images, therefore, replicate the personalized content rather than adhere to the text prompt instructions. Yet the base text-to-image has strong conceptual understanding capabilities that can be leveraged. We propose Conceptrol, a simple yet effective framework that enhances zero-shot adapters without adding computational overhead. Conceptrol constrains the attention of visual specification with a textual concept mask that improves subject-driven generation capabilities. It achieves as much as 89% improvement on personalization benchmarks over the vanilla IP-Adapter and can even outperform fine-tuning approaches such as Dreambooth LoRA. The source code is available at https://github.com/QY-H00/Conceptrol.