Abstract

This article studies one-stage 3-D object detection based on light detection and ranging (LiDAR) point clouds and red-green-blue (RGB) images that aims to boost 3-D object detection accuracy based on three attention mechanisms. Currently, most of the previous works converted LiDAR point clouds into bird's-eye-view (BEV) images, achieving a significant performance. However, they still have a problem due to partial height information (z-axis value) loss during the conversion. To eliminate this problem, the height information of the LiDAR point clouds is projected onto an RGB image and embedded into the original RGB image to generate a new image, named RGB <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><b>D</b></sup> . This is the first attention mechanism to improve 3-D detection accuracy. Moreover, two other attention mechanisms extract more discriminative global and local features, respectively. Specifically, the global attention network is appended to a feature encoder, and the local attention network is used for the view-specific region of interest fusion. Massive experiments evaluated on the KITTI benchmark suite show that the proposed approach outperforms state-of-the-art LiDAR-Camera-based methods on the car class (easy, moderate, hard): 2-D (90.35%, 88.47%, 86.98%), 3-D (85.12%, 76.23%, 74.46%), and BEV (89.64%, 86.23%, 85.60%).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.