Abstract
3D object detection from the LiDAR point cloud plays an important role in autonomous driving. It is difficult to balance inference speed and detection accuracy when performing 3D point cloud object detection due to the large size of point cloud data and its unstructured storage, which makes it difficult to represent its features. To address the challenge, we propose a two-stage point cloud object detector, AFV-RCNN. In stage-1, the attention flexible voxel feature encoding layer is introduced, which utilizes flexible voxels to enhance feature encoding speed and focuses on foreground points to be detected through voxel attention. In stage-2, the multi-level and grid-based multi-scale RoI (Region of Interest) feature fusion module is designed. It directly extracts complete 3D structures from 3D region proposals and focuses on both local and global features through multi-scale partitioning. In the training stage, GHM-C Loss is applied to address the challenges associated with imbalanced target categories and the imbalance between difficult and easy samples in the classification task. We evaluate the model on the public KITTI Dataset and Waymo Open Dataset. The mAP in KITTI for 3D detection is 73.41% and inference on a single GPU reaches 30.0 fps. Compared with other state-of-the-art methods, AFV-RCNN achieves both the inference speed of a single-stage detector and the detection accuracy of a two-stage detector. It ensures higher detection accuracy while efficiently processing the point cloud.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have