Drag View: Generalizable Novel View Synthesis with Unposed Imagery
Abstract
DragView is an interactive framework that generates novel views of unseen scenes using a single source image and sparse unposed multi-view images, utilizing an epipolar attention mechanism and transformer-based ray decoding without estimating camera poses.
We introduce DragView, a novel and interactive framework for generating novel views of unseen scenes. DragView initializes the new view from a single source image, and the rendering is supported by a sparse set of unposed multi-view images, all seamlessly executed within a single feed-forward pass. Our approach begins with users dragging a source view through a local relative coordinate system. Pixel-aligned features are obtained by projecting the sampled 3D points along the target ray onto the source view. We then incorporate a view-dependent modulation layer to effectively handle occlusion during the projection. Additionally, we broaden the epipolar attention mechanism to encompass all source pixels, facilitating the aggregation of initialized coordinate-aligned point features from other unposed views. Finally, we employ another transformer to decode ray features into final pixel intensities. Crucially, our framework does not rely on either 2D prior models or the explicit estimation of camera poses. During testing, DragView showcases the capability to generalize to new scenes unseen during training, also utilizing only unposed support images, enabling the generation of photo-realistic new views characterized by flexible camera trajectories. In our experiments, we conduct a comprehensive comparison of the performance of DragView with recent scene representation networks operating under pose-free conditions, as well as with generalizable NeRFs subject to noisy test camera poses. DragView consistently demonstrates its superior performance in view synthesis quality, while also being more user-friendly. Project page: https://zhiwenfan.github.io/DragView/.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LEAP: Liberate Sparse-view 3D Modeling from Camera Poses (2023)
- Consistent-1-to-3: Consistent Image to 3D View Synthesis via Geometry-aware Diffusion Models (2023)
- Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views (2023)
- Light Field Diffusion for Single-View Novel View Synthesis (2023)
- 3D Reconstruction with Generalizable Neural Fields using Scene Priors (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
 AK
							AK 
					 
					 
					