Abstract

3D point cloud semantic segmentation is of great significance for self-driving and virtual reality, and it is an important research topic in 3D vision. In this paper, we present a Multi-scale Up-sampling Aggregation Network (MUAN) to improve semantic segmentation performance in 3D sophisticated environments. We take PAConv as the backbone of the MUAN. First, to overcome the problem of weak capture ability of the Multi Layer Perceptron (MLP) in the low-dimensional space, we introduce an Attentive ScoreNet (ASN) module that combines a Point Feature Enrichment (PFE) module and an attention mechanism to ensure provide effective distribution scores for the weight matrices. Second, to address the issue of the semantic gap between encoders and decoders, we present a novel Multi-scale Up-sampling Aggregation (MUA) module that expands the receptive field before semantic prediction, and it fused multi-scale features containing information about multi-scale encoder blocks and decoder blocks. Third, to facilitate the MUA module to achieve better results, we design a Bilateral Feature Fusion (BFF) module which aims to boost the global awareness of the network in each decoder block. A combination of the BFF and the MUA pays attention to the multi-scale features of the encoder and decoder, thereby improving both the semantic representation and the possibility of correct semantic prediction of the point cloud. The experimental results reported that our MUAN can improve at least 2.36% mIoU and 0.5% class mIoU in the S3DIS Area-5 and ShapeNet datasets compared to the PAConv.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call