arxiv:2511.19319

SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis

Published on Nov 24

· Submitted by

levon dang on Nov 25

Upvote

Authors:

Lingwei Dang ,

Juntong Li ,

Yebin Liu ,

Abstract

SyncMV4D generates realistic and consistent multi-view 3D Hand-Object Interaction videos and 4D motions by integrating visual priors, motion dynamics, and multi-view geometry.

AI-generated summary

Hand-Object Interaction (HOI) generation plays a critical role in advancing applications across animation and robotics. Current video-based methods are predominantly single-view, which impedes comprehensive 3D geometry perception and often results in geometric distortions or unrealistic motion patterns. While 3D HOI approaches can generate dynamically plausible motions, their dependence on high-quality 3D data captured in controlled laboratory settings severely limits their generalization to real-world scenarios. To overcome these limitations, we introduce SyncMV4D, the first model that jointly generates synchronized multi-view HOI videos and 4D motions by unifying visual prior, motion dynamics, and multi-view geometry. Our framework features two core innovations: (1) a Multi-view Joint Diffusion (MJD) model that co-generates HOI videos and intermediate motions, and (2) a Diffusion Points Aligner (DPA) that refines the coarse intermediate motion into globally aligned 4D metric point tracks. To tightly couple 2D appearance with 4D dynamics, we establish a closed-loop, mutually enhancing cycle. During the diffusion denoising process, the generated video conditions the refinement of the 4D motion, while the aligned 4D point tracks are reprojected to guide next-step joint generation. Experimentally, our method demonstrates superior performance to state-of-the-art alternatives in visual realism, motion plausibility, and multi-view consistency.

View arXiv page View PDF Project page Add to collection

Community

levondang

Paper author Paper submitter 7 days ago

TL;DR: A novel method for synchronously generating multi-view hand-object interaction videos and 4D motion.
Project page at https://droliven.github.io/SyncMV4D/.
Video demonstration: https://youtu.be/G7pda3nmV70.

levondang

Paper author Paper submitter 7 days ago

TL;DR: A novel method for synchronously generating multi-view hand-object interaction videos and 4D motion.
Project page at https://droliven.github.io/SyncMV4D/.
Video demonstration: https://youtu.be/G7pda3nmV70.

librarian-bot

5 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.19319 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.19319 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.19319 in a Space README.md to link it from this page.