Abstract

3D object detection in LiDAR point clouds has been extensively used in autonomous driving, intelligent robotics, and augmented reality. Although the one-stage 3D detector has satisfactory training and inference speed, there are still some performance problems due to insufficient utilization of bird’s eye view (BEV) information. In this paper, a new backbone network is proposed to complete the cross-layer fusion of multi-scale BEV feature maps, which makes full use of various information for detection. Specifically, our proposed backbone network can be divided into a coarse branch and a fine branch. In the coarse branch, we use the pyramidal feature hierarchy (PFH) to generate multi-scale BEV feature maps, which retain the advantages of different levels and serves as the input of the fine branch. In the fine branch, our proposed pyramid splitting and aggregation (PSA) module deeply integrates different levels of multi-scale feature maps, thereby improving the expressive ability of the final features. Extensive experiments on the challenging KITTI-3D benchmark show that our method has better performance in both 3D and BEV object detection compared with some previous state-of-the-art methods. Experimental results with average precision (AP) prove the effectiveness of our network.

Highlights

  • In recent years, convolutional neural networks (CNNs) have played a pivotal role in addressing the issues of object detection [1,2,3], semantic segmentation [4,5,6], and image super-resolution [7,8,9]

  • We propose a new method to complete the cross-layer fusion of multi-scale feature maps, which uses the pyramid splitting and aggregation (PSA) module to integrate different levels of information

  • We propose a novel backbone network to extract the robust features from the bird’s eye view, which combines the advantages of cross-layer fusion features and original multi-scale features

Read more

Summary

Introduction

Convolutional neural networks (CNNs) have played a pivotal role in addressing the issues of object detection [1,2,3], semantic segmentation [4,5,6], and image super-resolution [7,8,9]. The average precision (AP) of 2D car detection is relatively considerable, autonomous driving is still a challenging task. The accuracy of 3D object detection directly impacts the safety and reliability of autonomous driving. As RGB images lack the necessary depth information, many researchers turn their attention to point cloud data, which retains accurate spatial information of objects. With the popularity of LiDAR and RGB-D cameras, the acquisition of point cloud data has become more convenient and feasible. How to effectively utilize the reliable information of the point cloud data for 3D object detection is a challenging task

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call