Advertisement

Adobe's FaceLift - 3D Head Reconstruction Tool from a Single Image

FaceLift is an innovative single-image 3D head reconstruction tool recently published by Adobe, which generates high-fidelity Gaussian head representations for high-quality 360-degree full-head novel view synthesis (NVS). https://arxiv.org/pdf/2412.17812

Results展示

1. Single image to 3D head reconstruction


  • : FaceLift achieves highly detailed 3D reconstructions from a single face image.
  • : During the generation process, it accurately preserves the identity features of the input image, ensuring consistency between the generated 3D head and the original image.

2. 4D novel view synthesis for video input


  • : For video input, FaceLift processes each frame individually, generating a continuous sequence of 3D Gaussian representations.
  • : By combining the generated 3D representation sequences, it supports dynamic 4D novel view synthesis, presenting multi-angle dynamic effects of the head in the video.

3. Integration with 2D face animation

  • : FaceLift can be combined with 2D animation methods such as LivePortrait.
  • : By mapping 2D face animations onto 3D representations, it achieves complete 3D facial animation effects, providing more possibilities for animation production and virtual avatar generation.

Advantages:

  • : Performs excellently in identity preservation and viewpoint consistency.
  • : Despite being trained only on synthetic data, it adapts well to real-world images.
  • : Surpasses existing 3D head reconstruction methods, enhancing the capture of fine facial and hairstyle details.

Application scenarios:

  • : Used for precise avatar generation in games and virtual characters.
  • : Extended to new view generation and animation reproduction for dynamic videos.
  • : Such as virtual try-ons and personalized digital avatar creation.

Method overview:

The method of FaceLift includes several key steps:

1. Multi-view diffusion model generation

  • : A single face image.
  • : Build an image-conditioned, multi-view diffusion model.
  • : Generate new-view images covering the entire head, including the side and back of the face.
  • : Utilize high-quality synthetic data and combine pre-trained model weights to ensure that the model can "speculate" unseen head views with high fidelity and multi-view consistency.

2. Gaussian splatting reconstruction

  • : Multi-view images and their corresponding camera poses.
  • : Generate 3D Gaussian splats representing the human head.
  • : Construct a complete 3D Gaussian representation for accurate description of the head structure.

3. Novel view synthesis

  • : Based on the generated 3D Gaussian representation, achieve high-quality novel view synthesis for the entire head.
  • : The generated views perform excellently in terms of identity preservation, detail capture, and viewpoint consistency.

Key innovations:

  1. : Through latent diffusion methods, generate complete multi-view head images conditioned on a single face image.
  2. : Describe the 3D structure of the head using Gaussian splatting, improving the flexibility and accuracy of view synthesis.
  3. Efficient training strategy
  • Train the multi-view generation model using synthetic data to enhance the model's generation capability.
  • Pre-train GS-LRM on the Objaverse dataset and fine-tune it on synthetic data to further improve the quality of 3D representation.