Fusion of images and point clouds for the semantic segmentation of large-scale 3D scenes based on deep learning

Rui Zhang,Guangyun Li,Minglei Li,Li Wang

doi:10.1016/j.isprsjprs.2018.04.022

Abstract

We address the issue of the semantic segmentation of large-scale 3D scenes by fusing 2D images and 3D point clouds. First, a Deeplab-Vgg16 based Large-Scale and High-Resolution model (DVLSHR) based on deep Visual Geometry Group (VGG16) is successfully created and fine-tuned by training seven deep convolutional neural networks with four benchmark datasets. On the val set in CityScapes, DVLSHR achieves a 74.98% mean Pixel Accuracy (mPA) and a 64.17% mean Intersection over Union (mIoU), and can be adapted to segment the captured images (image resolution 2832 ∗ 4256 pixels). Second, the preliminary segmentation results with 2D images are mapped to 3D point clouds according to the coordinate relationships between the images and the point clouds. Third, based on the mapping results, fine features of buildings are further extracted directly from the 3D point clouds. Our experiments show that the proposed fusion method can segment local and global features efficiently and effectively.

Full Text