MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
Abstract
Mixture-of-Attention (MoA) architecture personalizes text-to-image diffusion models by blending a fixed attention prior branch with a learnable personalized branch to enhance subject-context control.
We introduce a new architecture for personalization of text-to-image diffusion models, coined Mixture-of-Attention (MoA). Inspired by the Mixture-of-Experts mechanism utilized in large language models (LLMs), MoA distributes the generation workload between two attention pathways: a personalized branch and a non-personalized prior branch. MoA is designed to retain the original model's prior by fixing its attention layers in the prior branch, while minimally intervening in the generation process with the personalized branch that learns to embed subjects in the layout and context generated by the prior branch. A novel routing mechanism manages the distribution of pixels in each layer across these branches to optimize the blend of personalized and generic content creation. Once trained, MoA facilitates the creation of high-quality, personalized images featuring multiple subjects with compositions and interactions as diverse as those generated by the original model. Crucially, MoA enhances the distinction between the model's pre-existing capability and the newly augmented personalized intervention, thereby offering a more disentangled subject-context control that was previously unattainable. Project page: https://snap-research.github.io/mixture-of-attention
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration (2024)
- IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models (2024)
- MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation (2024)
- Attention Calibration for Disentangled Text-to-Image Personalization (2024)
- OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
 You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: 
@librarian-bot
	 recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
 AK
							AK 
					 
					 
					 
					 
					 
					 
					 
					 
						
 
						 
					