Advertisement

Snap's Wonderland: Generating 3D Scenes from a Single Image

Today, let’s take a look at Snap's Wonderland project, which generates three-dimensional scenes through a single image. Compared to traditional 3D reconstruction methods, its advantages are reflected in efficient generation, wide applicability, and high-quality 3D scene representation.

Introduction

It is the first proof that the latent space of diffusion models can be effectively utilized to build 3D reconstruction models, thus enabling efficient 3D scene generation.

Examples

  1. One-shot generation of 3D scenes

  2. Navigating 3D scenes generated by autoregressive methods

  3. Video generation based on camera trajectories

  4. Scene exploration under multiple camera trajectories

Methodology

Given a single image, a camera-guided video diffusion model generates a 3D-aware video latent space according to the camera trajectory. This latent space is then utilized by a large-scale reconstruction model based on the latent space (LaLRM) to construct a 3D scene in a feed-forward manner. The video diffusion model employs a dual-branch camera modulation mechanism to achieve precise control of camera poses. LaLRM operates within the latent space, efficiently reconstructing wide-angle, high-fidelity 3D scenes.