LayerDiffuse - AI Generated Transparent Images

Today I looked at an SD extension for generating images with transparent backgrounds. This paper was released at the end of last month by the Stanford Lab.

https://arxiv.org/abs/2402.17113

Introduction

LayerDiffuse enables large-scale pretrained latent diffusion models to generate transparent images. This method allows the generation of either a single transparent image or multiple transparent layers. It learns a "latent transparency" that encodes the transparency of the alpha channel into the latent manifold of the pretrained diffusion model. By modulating the added transparency in the form of latent offsets, it maintains the production-ready quality of large diffusion models while keeping changes to the latent distribution of the original pretrained model minimal. In this way, any latent diffusion model can be transformed into a transparent image generator by fine-tuning the adjusted latent space.

Usage Effects

LayerDiffuse can be used in the WebUI of Stable Diffusion (Forge version), and there are also workflows contributed by community members for ComfyUI. Currently, the official examples provided are mainly for text-to-image applications, and the image-to-image functionality has not yet been launched.

Usage scenarios

Generate transparent images only (attention injection): This scenario focuses solely on generating foreground objects with transparency, without any textual or image prompts for the foreground or background.
Generate all elements simultaneously: In this scenario, the model generates a complete image based on textual prompts for the foreground, background, and blended image, while also outputting a transparent foreground object, background image, and blended image.
From background to blended image generation: Here, the model uses the background image to generate the blended image, without directly generating the foreground object or blended image.
From foreground to blended image generation: In this scenario, the model generates the blended image based on the image prompt of the foreground object, without directly generating the background image.
Generation from background to foreground: The model receives textual prompts for the background and image prompts for the foreground, then generates the foreground object and a composite image while retaining the ability to process the transparency of the foreground object.
Generation from background and composite image to foreground: In this scenario, the model uses image prompts from the background and the composite image to generate the foreground object, focusing on the transparency processing of the foreground.
Generation from foreground to background: The model uses the image of the foreground object and textual prompts for the background to generate the background image and composite image, without directly generating a transparent foreground object.
Generation from foreground and composite image to background: In this scenario, the model generates the background image based on image prompts from the foreground and the composite image, without directly handling the generation of a transparent foreground object.

Technical Details

A collection scheme with human-in-the-loop was used to gather one million pairs of transparent image layers to train the model. LayerDiffuse demonstrates that latent transparency can be applied to different open-source image generators, or adapted to various conditional control systems for applications such as layer generation under foreground/background conditions, joint layer generation, structural control of layer content, etc.

The framework takes an input transparent image and encodes a "latent transparency" to adjust the latent space of Stable Diffusion. The adjusted latent image can be decoded to reconstruct the color and alpha channels. This latent space with transparency can further be used for training or fine-tuning pre-trained image diffusion models.

Comparison

In 97% of cases, users prefer transparent content natively generated by LayerDiffuse over previous temporary solutions, such as generating and then cutting out images.

Users also reported that the quality of transparent images generated by LayerDiffuse rivals that of commercial stock photo libraries (such as Adobe Stock) that provide transparent backgrounds.