Abstract
The development of 3D object detectors for dealing with 3D point clouds generated by LiDAR sensors is facing a significant challenge in real‐world autonomous driving scenarios. Current research mainly focuses on Voxel‐based detectors, which use sparse convolution for training and inference. These models often require substantial computational resources for training, making them hard to be applied to real autonomous vehicles. Among these models, two models called PointPillars and CenterPoint (pillar‐version) are noticed since they are based on 2D Pillar encoding, making the inferencing process fast. However, in comparison to other models, they exhibit relatively lower detection accuracy performances. In this paper, to enhance the detection accuracy of Pillar encoding models without significantly increasing computational complexity, attention modules added within the Pillar encoder are proposed. These modules adopt the attention mechanism while reducing input dimensions. Simultaneously, the attention modules are also added to the CNN backbone network to increase the detection accuracy. The inference time increases from 16 to 17 ms, compared with the fastest PointPillar model. The effectiveness of the proposed network is proven by experiments. © 2025 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have