Semantic Score Distillation Sampling for Compositional Text-to-3D Generation
Abstract
A new Semantic Score Distillation Sampling approach improves accuracy and expressiveness in text-to-3D generation by using semantic embeddings and maps, enhancing pre-trained diffusion models.
Generating high-quality 3D assets from textual descriptions remains a pivotal challenge in computer graphics and vision research. Due to the scarcity of 3D data, state-of-the-art approaches utilize pre-trained 2D diffusion priors, optimized through Score Distillation Sampling (SDS). Despite progress, crafting complex 3D scenes featuring multiple objects or intricate interactions is still difficult. To tackle this, recent methods have incorporated box or layout guidance. However, these layout-guided compositional methods often struggle to provide fine-grained control, as they are generally coarse and lack expressiveness. To overcome these challenges, we introduce a novel SDS approach, Semantic Score Distillation Sampling (SemanticSDS), designed to effectively improve the expressiveness and accuracy of compositional text-to-3D generation. Our approach integrates new semantic embeddings that maintain consistency across different rendering views and clearly differentiate between various objects and parts. These embeddings are transformed into a semantic map, which directs a region-specific SDS process, enabling precise optimization and compositional generation. By leveraging explicit semantic guidance, our method unlocks the compositional capabilities of existing pre-trained diffusion models, thereby achieving superior quality in 3D content generation, particularly for complex objects and scenes. Experimental results demonstrate that our SemanticSDS framework is highly effective for generating state-of-the-art complex 3D content. Code: https://github.com/YangLing0818/SemanticSDS-3D
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Compositional 3D-aware Video Generation with LLM Director (2024)
- Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation (2024)
- Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis (2024)
- IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation (2024)
- MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
 You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: 
@librarian-bot
	 recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
 Ling Yang
							Ling Yang 
					 
					 
					 
					 
						