Abstract

Prior works in voxel-based Lidar 3D object detection have demonstrated promising results in detecting a variety of road objects such as cars, pedestrians, and cyclists. However, these works generally reduce the feature space from a 3D volume into a 2D bird eye view (BEV) map before generating object proposals to speed up the inference runtime. As a result, the resolution of information in the z-axis is reduced significantly. In this work, we hypothesize that augmenting the BEV features with features obtained from a front view (FV) map may provide a way for the network to partially recover the high-resolution z-axis information. The augmentation allows object proposals to be inferred in the BEV, maintaining the fast runtime, and simultaneously improving the 3D detection performance. To support our hypothesis, we design a multi-view attention module that augments the BEV features with the FV features and conduct extensive experiments on the widely used KITTI dataset. Based on the experimental results, our method successfully improves various existing voxel-based 3D object detection networks by a significant margin.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.