Abstract

The 3D semantic map plays an increasingly important role in a wide variety of applications, especially for many kinds of task-driven robots. In this paper, we present a semantic mapping methodology for 3D semantic map obtaining from RGB-D scans. In contrast to existing methods that use 3D annotated information as supervisory, we focus on accurate 2D frame labeling and combine labels in 3D space using semantic fusion mechanism. For scene parsing, a two-stream network with a novel discriminatory mask loss is proposed to explore sufficient extraction and fusion of RGB and depth information achieving steadily semantic segmentation. The discriminatory mask guides the cross-entropy loss function and interprets the influence of different pixels on back-propagation, which reduces the harmful effects of the depth noise or the fallible annotation at the edges of objects. After the correspondences between frames are provided, these semantic frames are fused in unified 3D coordinates using the novel label-oriented voxelgrid filter. It can ensure the intra-frame spatial continuity and the inter-frame spatiotemporal consistency through introducing the label-oriented statistical principle into labeled point clouds. In order to avoid the unfavorable interference between uncorrelated frames, we further propose an adaptive grouping algorithm by applying the view frustum filter to group frames with sufficient overlap as a segment. To this end, we demonstrate the effectiveness of the proposed method on the 2D/3D semantic label benchmark of ScanNetv2 and Cityscapes datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call