Abstract

Point-based networks have been widely used in the semantic segmentation of point clouds owing to the powerful 3D convolution neural network (CNN) baseline. Most of the current methods resort to intermediate regular representations for reorganizing the structure of point clouds for 3D CNN networks, but they may neglect the inherent contextual information. In our work, we focus on capturing discriminative features with the interactive attention mechanism and propose a novel method consisting of the regional simplified dual attention network and global graph convolution network. Firstly, we cluster homogeneous points into superpoints and construct a superpoint graph to effectively reduce the computation complexity and greatly maintain spatial topological relations among superpoints. Secondly, we integrate cross-position attention and cross-channel attention into a single head attention module and design a novel interactive attention gating (IAG)-based multilayer perceptron (MLP) network (IAG–MLP), which is utilized for the expansion of the receptive field and augmentation of discriminative features in local embeddings. Afterwards, the combination of stacked IAG–MLP blocks and the global graph convolution network, called IAGC, is proposed to learn high-dimensional local features in superpoints and progressively update these local embeddings with the recurrent neural network (RNN) network. Our proposed framework is evaluated on three indoor open benchmarks, and the 6-fold cross-validation results of the S3DIS dataset show that the local IAG–MLP network brings about 1% and 6.1% improvement in overall accuracy (OA) and mean class intersection-over-union (mIoU), respectively, compared with the PointNet local network. Furthermore, our IAGC network outperforms other CNN-based approaches in the ScanNet V2 dataset by at least 7.9% in mIoU. The experimental results indicate that the proposed method can better capture contextual information and achieve competitive overall performance in the semantic segmentation task.

Highlights

  • PointNet and Graph Attention Convolution (GAC) only executed local networks with multilayer perceptron (MLP) and graph attention convolution, respectively, while both PointNet++ and SPG implemented global aggregation with local embeddings derived from PointNet

  • interactive attention gating (IAG)–MLP executes an interactive-attention mechanism where the embeddings can be dominated by the augmented features from the combination of multiple feature channels in the dot-production enhanced procedure, which is beneficial to objects that show distinctive geometry-based and color-based characteristics

  • We present a novel 3D deep architecture for semantic segmentation in door scenes, named Interactive Attention-based Graph Convolution (IAGC)

Read more

Summary

Introduction

In the reconstruction of indoor environments, laser scanning point clouds have been generally used, providing high-precision and rich spatial information for the Building. Effective semantic segmentation should be conducted before retrieving geometric entities. It can boost better scene understanding and high-accuracy, entity-based modeling [2,3]. In the past few years, segmenting operations have mainly focused on designing handcrafted features [4–6] using empirical knowledge about spatial geometrics or symmetry

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call