Extracting geometric and semantic point cloud features with gateway attention for accurate 3D object detection

Hongbo Zhang,Yong Zhang,Jixiang Du,Huaijin Liu

doi:10.1016/j.engappai.2023.106227

Abstract

3D object detection using point clouds has received a lot of attention in autonomous vehicles, robotics, and virtual reality. However, feature learning for 3D object detection from point cloud is very challenging due to the irregularity and sparsity of 3D point cloud data. Grid-based methods convert irregular point clouds into regular 2D views or 3D voxels and then use 2D CNN or 3D CNN for feature learning, but the point cloud transformation process will inevitably cause quantization loss. Point-based methods use the PointNet network to directly learn the features of the point cloud, but the semantic information obtained by PointNet may be incomplete. To address the above issues, we propose a novel Gateway Attention-based Point Set Abstraction 3D object detector (GAPSA) to learn geometric and semantic point cloud features. Specifically, the framework utilizes set abstraction downsampling points and performs local feature extraction on the sampling points through the proposed gateway attention pooling module to learn more discriminative point cloud features. Given the high-quality 3D proposals generated by attention-based backbone networks, we design a RoI multi-pooling head to adaptively learn features for sparse points of interest within proposals, so as to encode richer contextual information and obtain fine-grained features to accurately estimate object confidence and location. Compared with advanced point-based 3D object detectors, experimental results demonstrate that our attention-based point set abstraction 3D object detector has the best detection performance on KITTI and NuScenes datasets. The code is available at https://github.com/liuhuaijjin/GAPSA.

Full Text