To convert a single RGB-D input image into a 3D photo, a team of researchers from Virginia Tech and Facebook developed a deep learning-based image inpainting model that can synthesize color and depth structures in regions occluded in the original view.
“Classic image-based reconstruction and rendering techniques require elaborate capture setups involving many images with large baselines, and/or special hardware,” the researchers stated in their paper, 3D Photography using Context-aware Layered Depth Inpainting. “In this work, we present a new learning-based method that generates a 3D photo from an RGB-D input. The depth can either come from dual-camera cell phone stereo or be estimated from a single RGB image.”
Compared to previous state-of-the-art approaches, the method, which is based on a standard CNN, shows fewer artifacts during the image conversion process.
“Unlike most previous approaches, we do not require predetermining a fixed number of layers. Instead, our algorithm adapts by design to the local depth-complexity of the input and generates a varying number of layers across the image,” the researchers stated. “We have validated our approach on a wide variety of photos captured in different situations.”
The model was trained using an NVIDIA V100 GPU with the cuDNN-accelerated PyTorch deep learning framework. The model can be trained using any image dataset without the need for annotated data. For this project, the team used the MS COCO dataset, with the pretrained MegaDepth model first published by Cornell University researchers in 2018.
To show the potential of the project, a separate researcher from Google has taken the code and developed a Chrome extension that turns every Instagram post into 3D images. Developers interested in setting up the extension can follow the instagram-3d-photo tutorial to set up and run the project using NVIDIA GPUs on the Google Cloud Platform.