Chapter 2.4 of the article discusses AI image generation. In previous articles, we have explored image generation technology multiple times. Today, we will focus on the section about 3D image generation. The report mentions four 3D production models: MVDream, Instruct-NeRF2NeRF, Skoltech3D, and RealFusion.
Model effectiveness
MVDream
Instruct-NeRF2NeRF
Skoltech3D
RealFusion
MVDream
MVDream is a new-generation 3D generation system jointly developed by ByteDance and researchers from the University of California, San Diego. It successfully overcomes several challenges in the field of 3D image generation. In traditional AI research, creating 3D geometries or models from text prompts has always been a significant challenge, with issues such as the "multi-face problem" (inaccurate context reconstruction of polyhedra described by text prompts) and "content drift" (inconsistency between different 3D views). MVDream solves these problems through its advanced technology.
In quantitative evaluations, the models generated by MVDream score comparably to those in the training set in terms of Inception Score (IS) and CLIP scores, indicating high-quality generated images, as shown in the figure below.
Github Link🔗: https://github.com/bytedance/MVDream
Instruct-NeRF2NeRF
Instruct-NeRF2NeRF is a novel model developed by researchers at UC Berkeley that uses an image-based diffusion model for iterative text-based editing of 3D geometries. This method can efficiently generate new, edited images that strictly follow text instructions, achieving higher consistency than current leading methods, as shown in the figure below.
This model demonstrates the powerful application potential of text instructions in editing 3D images, providing a new, efficient solution for 3D geometry editing.
Github Link🔗: https://github.com/ayaanzhaque/instruct-nerf2nerf
Skoltech3D
In 2023, an international research team launched Skoltech3D, a new large-scale dataset for multi-view 3D surface reconstruction. This dataset contains 1.4 million images from 107 scenes, each captured from 100 different viewpoints and photographed under 14 different lighting conditions, making Skoltech3D a significant improvement over existing 3D reconstruction datasets, as shown in the figure below.
This rich dataset greatly advances the development of AI systems for specific tasks, especially in the field of 3D image processing where data scarcity often becomes a bottleneck. Skoltech3D provides researchers with unprecedented resources through highly diverse perspectives and lighting conditions to explore and improve 3D reconstruction techniques.
Github Link🔗: https://github.com/Skoltech-3D/sk3d_data
RealFusion
RealFusion is a new method developed by researchers at Oxford University that can generate complete 3D models from a single image, overcoming the challenge of obtaining insufficient information from a single image for full 360-degree reconstruction. This method uses existing 2D image generators to produce images of objects from multiple perspectives, which are then combined into a comprehensive 360-degree model. Compared to the latest technologies in 2021 (such as self-supervised learning methods like Shelf-Supervised), RealFusion provides more accurate 3D reconstruction results for various objects, as shown in the figure below.
This technology offers an innovative solution for images obtained from a single perspective, significantly improving the accuracy and detail of 3D models by synthesizing multiple generated views. RealFusion represents an advanced step in 3D reconstruction technology, particularly suitable for applications requiring the construction of high-quality 3D models from limited perspective data.
Github Link🔗: https://github.com/lukemelas/realfusion
However, several 3D models have very few stars on GitHub, with poor effects, making them difficult to commercialize. Let's wait a bit longer.