Papers
arxiv:2412.01064

FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait

Published on Dec 2, 2024
· Submitted by AK on Dec 3, 2024
#2 Paper of the day
Authors:

Abstract

FLOAT method generates high-quality, temporally consistent, and emotion-enhanced talking portraits using flow matching in a learned motion latent space with a transformer-based vector field predictor.

AI-generated summary

With the rapid advancement of diffusion-based generative models, portrait image animation has achieved remarkable results. However, it still faces challenges in temporally consistent video generation and fast sampling due to its iterative sampling nature. This paper presents FLOAT, an audio-driven talking portrait video generation method based on flow matching generative model. We shift the generative modeling from the pixel-based latent space to a learned motion latent space, enabling efficient design of temporally consistent motion. To achieve this, we introduce a transformer-based vector field predictor with a simple yet effective frame-wise conditioning mechanism. Additionally, our method supports speech-driven emotion enhancement, enabling a natural incorporation of expressive motions. Extensive experiments demonstrate that our method outperforms state-of-the-art audio-driven talking portrait methods in terms of visual quality, motion fidelity, and efficiency.

Community

Paper submitter

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

No code no model :(

Still no code or model?

Yea still no code :)

Paper author

Hi guys,
Sorry for the late update. The inference code and checkpoints are released.
https://github.com/deepbrainai-research/float

·

Is there any way to make videos 1024x1024 instead of 512x512?

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.01064 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.01064 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.01064 in a Space README.md to link it from this page.

Collections including this paper 9