Abstract

Real-time three-dimensional (3D) object detection has become a crucial component of autonomous driving applications. Recent research demonstrates that a voxel-based feature aggregation method is accurate and efficient in large 3D scenes. However, the parameter choice of voxel size has become a sensitive issue because of the contradiction between its detection performance and inference speed. To alleviate this problem, in this paper we propose a sparse multi-scale voxel feature aggregation network (SMS-Net), a novel one-stage, end-to-end network that primarily contains a sparse multi-scale-fusion (SMSF) module and shallow-to-deep regression (SDR) module. First, the raw point clouds are divided into different scales of voxels to construct diverse 3D sparse feature maps. Then, the SMSF module attentively aggregates the point-wise features with a perspective-channel attention mechanism and fuses multi-scale features at 3D sparse feature-map level to achieve more fine-grained shape information. In addition, the new SDR module boosts the localization accuracy and 3D box estimation accuracy through multiple aggregation at feature-map level, which requires less computational overhead. Extensive experiments demonstrate the remarkable performance improvements from each module of the proposed method. On the KITTI 3D object detection benchmark, for example, SMS-Net outperforms most one-stage, state-of-the-art methods and its performance can even be compared to that of two-stage methods. These detection results are achieved with a real-time inference speed of 42 Hz. SMS-Net also achieves state-of-the-art performance on the nuScenes 3D benchmark.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.