Reconstructing semantic indoor scenes is a challenging task in augmented and virtual reality. The quality of scene reconstruction is limited by the complexity and occlusion of indoor scenes. This is due to the difficulty in estimating the spatial structure of the scene and insufficient learning for object location inference. To address these challenges, we have developed PesRec, an end-to-end multi-task scene reconstruction network for parameterizing indoor semantic information. PesRec incorporates a newly designed spatial layout estimator and a 3D object detector to effectively learn scene parameter features from RGB images. We modify an object mesh generator to enhance the robustness of reconstructing indoor occluded objects through point cloud optimization in PesRec. Using the analyzed scene parameters and spatial structure, the proposed PesRec reconstructs an indoor scene by placing object meshes scaled to 3D detection boxes in an estimated layout cuboid. Extensive experiments on two benchmark datasets demonstrate that PesRec performs exceptionally well for object reconstruction with an average chamfer distance of 5.24 × 10-3 on the Pix3D dataset including 53.61 % mAP for 3D object detection and 79.7 % 3D IoU for the estimation of layout on the commonly-used SUN RGB-D datasets. The proposed computing network breaks through the limitations caused by complex indoor scenes and occlusions, showing optimization results that improve the quality of reconstruction in the fields of augmented reality and virtual reality.
Read full abstract