Accurate 3D environmental perception is crucial for ensuring the safety of autonomous vehicles. However, many existing point-based 3D object detection methods rely only on independent point features to extract spatial information and ignore the dependencies between points. Meanwhile, these methods usually use max-pooling operations to aggregate neighboring point features without considering the stability of local geometric information. To address these issues, this paper proposes a novel two-stage LiDAR-based key point transformer that consists of a region proposal network module and a region-based transformer module to enhance 3D detection performance. Firstly, unprocessed raw point clouds are taken as the input of the model to preserve detailed spatial structural information. Secondly, in the region proposal network an enhanced PointNet++ is used to segment foreground points and generate high-quality first-stage proposals. Then, the representation of proposals is improved by the region-based transformer module, which constructs long-range key point cloud relationship features within the regions of interest and diffuses spatial information into the feature channel layers. Additionally experiments on the KITTI dataset show that detection performance of the proposed method surpasses most mainstream 3D object detection methods and achieves a detection speed of 14.28 Hz, striking a good balance between detection accuracy and speed. Finally, the proposed method is deployed on real vehicle hardware platforms for online detection, thereby proving its industrial value.
Read full abstract