Abstract

Semantic Scene Completion (SSC) aims to reconstruct complete 3D scenes with precise voxel-wise semantics from the single-view incomplete input data, a crucial but highly challenging problem for scene understanding. Although SSC has seen significant progress due to the introduction of 2D semantic priors in recent years, the occluded parts, especially the rear-view of the scenes, are still poorly completed and segmented. To ameliorate this issue, we propose a novel deep learning framework for 3D SSC, named Planar Convolution and Attention-based Network (PCANet), to effectively extend high-precision predictions of the front-view surface to the rear-view occluded areas. Specifically, we decompose the traditional convolutional layer into three successive planar convolutions to form a Planar Convolution Residual (PCR) block, which maintains the planar features of the 3D scene. Afterward, the Planar Attention Module (PAM) is proposed to capture three different planar attentions and harvest the global context from the front surface to the rear occluded areas to improve the overall accuracy. Extensive experiments on the real NYU and NYUCAD datasets and the synthetic SUNCG-RGBD dataset demonstrate that our proposed framework can generate high-quality SSC results in both front and rear views and outperforms the state-of-the-art approaches trained in an end-to-end manner without additional data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call