Abstract

In this paper, we present a new deep neural network architecture, which detects objects in bird's eye view (BEV) using Lidar sensor data in autonomous driving scenarios. The key idea of the proposed method is to improve the accuracy of the object detection by exploiting the 3D global context provided by the whole set of Lidar points. The overall structure of the proposed method consists of two parts: 1) the detection core network (DetNet) and 2) the context extraction network (ConNet). First, the DetNet generates the BEV representation by projecting the Lidar points into the BEV plane and applies the CNN to extract the feature maps locally activated on the objects. The ConNet directly processes the whole set of the Lidar points to produce the $1 \times 1\times k$ feature vector capturing the 3D geometrical structure of the surrounding in the global scale. The context vector produced by the ConNet is concatenated to each pixel of the feature maps obtained by the DetNet. The combined feature maps are used to regress the oriented bounding box and identify the category of the object. The experiments evaluated on the public KITTI dataset show that the use of the context feature offers the significant performance gain over the baseline and the proposed object detector achieves the competitive performance as compared to the state of the art 3D object detectors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call