Multi-Head Self-Attention for 3D Point Cloud Classification

Yan-Zhao Wang,Xue-Yao Gao,Jia-Qi Lu,Chun-Xiang Zhang

doi:10.1109/access.2021.3050488

Yan-Zhao Wang, Xue-Yao Gao + Show 2 more

Open Access

https://doi.org/10.1109/access.2021.3050488

Copy DOI

Abstract

3D point cloud classification is a hot issue in recent years. 3D point cloud is different from regular data such as image and text. Disorder of point cloud makes two-dimensional (2D) convolution neural network (CNN) hard to be applied. When features are acquired from input data, it is important to extract global and local information effectively. In this paper, we propose a 3D model classification method based on multi-head self-attention mechanism which consumes sparse point clouds and learns robust latent representation of 3D point cloud. The framework is composed of self-attention layer, multilayer perceptrons (MLPs), fully connected (FC) layer, max-pooling layer and softmax layer. Feature vector of point includes spatial coordinates and shape descriptors, and they are encoded by self-attention layers to extract relationships among them. Outputs of attention layers are concatenated and put into MLPs to extract features. When they are transformed into the expected dimension by MLPs, max-pooling layer will be applied to get features in high level. Then, they are put into fully connected layer. Softmax layer is used to determine category of 3D model. The proposed method is applied to ModelNet40. Experimental results show that the proposed method is robust to rotation variance, position variance and point sparsity.

Highlights

3D model classification is key to 3D model retrieval
Features of point cloud are extracted by multi-head self-attention mechanism
Feature vector is constructed by coordinates and geometric information

Summary

RELATED WORK

In 3D shape classification task, local features are extracted at first. global shape representation of 3D object is gotten by aggregation methods. A new network is proposed to exploit correlative information from multiple views, in which CNNs are used to extract low-level feature from each view. A new approach is given to extract contextual features from local neighborhood in point cloud, and a novel module is designed to find interaction between points [22]. A new neural network is presented to classify point clouds, which incorporates local neighborhood information and learn global shape information [29]. A new CNN is given, in which convolution operator is applied to each point and pointwise features are captured These features are used for object recognition [30]. We propose a novel approach for classifying 3D models, which takes sparse point clouds as input and learns shape representation of 3D model. The influence of different scale on classification is ignored

GEOMETRIC FEATURES

THE PROPOSED METHOD

Findings

CONCLUSION