Advertisement

AnchorCrafter: Animating online streamers promoting products through human-object interaction video generation technology

and some universities. Currently, only the paper has been released, and the code has not been open-sourced yet.

is an innovative diffusion-based generative system that can create 2D videos containing target portraits and customized items, achieving high visual fidelity and controllable interaction.

Key innovations

  1. Enhances the ability to recognize object appearances from any multi-view perspective while decoupling the appearance features of objects and the human body.

  2. By solving the challenges of object trajectory control and occlusion management, complex human-object interactions are achieved.

  3. A new training objective is introduced, focusing on the optimization of object detail learning.


Method

1. The training process of AnchorCrafter

AnchorCrafter is based on a video diffusion model and achieves high-quality human-object interaction video generation in the following ways:

  • : Injecting features of the human body and multi-view objects into the video generation process to ensure highly accurate appearance restoration in the results.
  • : A specially designed module controls object motion, enabling natural interaction with human movements.
  • : Optimize the training objective by assigning higher weights to the details of human-object interaction regions, thereby enhancing the representation of object details.

2. HOI-Appearance Perception

The HOI-Appearance Perception module aims to efficiently integrate the appearance features of humans and objects:

  • :Extract target object features ( f_O ) through multi-perspective object feature fusion technology.
  • :Combine the target object feature ( f_O ) with the human reference feature ( f_H ) to build a bidirectional human-object feature representation.
  • :By decoupling the appearance features of humans and objects, enhance the appearance consistency of objects in generated videos and the restoration of details from multiple angles.

Contrast