Nanyang Technological University (NTU) released a paper last month: "StructLDM: A Structured Latent Diffusion Model for 3D Human Generation."
Model demonstration
StructLDM can generate diverse and viewpoint-consistent human models, and supports controllable generation and editing functions at different levels. For example, combinatorial generation can be achieved by selecting and mixing five parts; additionally, it supports identity swapping, local clothing editing, and 3D virtual try-on as well as other part-aware editing functions. These generation and editing functions do not rely on specific clothing types or mask conditions, demonstrating high applicability.
Technical details
StructLDM is a new 3D human generation model that is based on a collection of 2D images. Compared to existing 3D generative adversarial networks (3D GANs), it features an entirely new design paradigm. The model includes three key designs: a structured 2D latent space, a structured autodecoder, and a structured latent diffusion model.
Technical comparison
Qualitative results on the UBCFashion dataset show that StructLDM can generate diverse and viewpoint-consistent human models according to different poses/views, as well as various clothing styles (e.g., dresses) and hairstyles.
Code
The code has not been open-sourced yet, but you can follow this link for updates: https://github.com/TaoHuUMD/StructLDM