Abstract

Aiming at accurate and real-time object detection on 3D point clouds, we proposed a single-stage deep neural network which includes new solutions in three aspects: network architecture, loss function design and data augmentation. Firstly, the point clouds are directly voxelized to build a binary bird's eye view (BEV) map. The network is specially designed to combine the semantic and position information on point clouds to output a final feature map. When regressing the bounding boxes of objects from the bird's eye view, an extra prediction error regression is considered in the loss function to achieve the convergence with higher precision. In training process, a special data augmentation is adopted by mixing 3D point clouds from different frames to improve generalization performance of the network. Experimental results show that our approach achieves higher performance than the state-of-the-art methods on the KITTI BEV object detection benchmark at a frame rate of 20Hz, using only the position information of LIDAR point clouds.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call