Abstract

Semantic indoor 3D modeling with multi-task deep neural networks is an efficient and low-cost way for reconstructing an indoor scene with geometrically complete room structure and semantic 3D individuals. Challenged by the complexity and clutter of indoor scenarios, the semantic reconstruction quality of current methods is still limited by the insufficient exploration and learning of 3D geometry information. To this end, this paper proposes an end-to-end multi-task neural network for geometry-enhanced semantic 3D reconstruction of RGB-D indoor scenes (termed as GeoRec). In the proposed GeoRec, we build a geometry extractor that can effectively learn geometry-enhanced feature representation from depth data, to improve the estimation accuracy of layout, camera pose and 3D object bounding boxes. We also introduce a novel object mesh generator that strengthens the reconstruction robustness of GeoRec to indoor occlusion with geometry-enhanced implicit shape embedding. With the parsed scene semantics and geometries, the proposed GeoRec reconstructs an indoor scene by placing reconstructed object mesh models with 3D object detection results in the estimated layout cuboid. Extensive experiments conducted on two benchmark datasets show that the proposed GeoRec yields outstanding performance with 5.19×10-3 mean chamfer distance error for object reconstruction on the challenging Pix3D dataset, 70.45% mAP for 3D object detection and 77.1% 3D mIoU for layout estimation on the commonly-used SUN RGB-D dataset. Especially, the mesh reconstruction sub-network of GeoRec trained on Pix3D can be directly transferred to SUN RGB-D without any fine-tuning, manifesting a high generalization ability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call