AniPortrait - Audio-driven Realistic Portrait Animation Synthesis Technology

Today, let's take a look at Tencent's paper this week —— AniPortrait, an audio-driven realistic portrait animation synthesis technology.

AniPortrait aims to generate high-quality animations using audio and reference portrait images.

The framework works in two stages.

It extracts 3D facial meshes and head poses from audio information, then projects these two elements into 2D keypoints.
A diffusion model is used to transform these 2D keypoints into continuous portrait videos. These two stages are trained simultaneously within our framework.

The experimental results show the superiority of AniPortrait in terms of facial naturalness, pose diversity, and visual quality, providing viewers with an enhanced perceptual experience; and demonstrating great potential in flexibility and controllability, making it very suitable for applications in areas such as facial motion editing or facial reenactment.

Diversified generation video showcase

Self driven
Face reenacment
Audio driven