Abstract

Accurate 3D information is essential in the fields of autonomous driving and mobile robotics. Monocular 3D object detection provides a more economical solution than traditional LiDAR-dependent methods. Due to the lack of depth cues, monocular 3D object detection is extremely challenging to efficiently detect objects in 3D space from a single image. To mitigate this issue, we first identify how the depth information on surrounding pixels provides additional support for depth map estimation in the 3D detection of driving scenes. Based on this observation, we design Depth Dynamic Center Difference Convolution (DDCDC), which introduces surrounding pixel cues in depth estimation and has different convolution kernels weights for each pixel of all examples. This module not only overcomes the limitations of conventional 2D convolution, but also highlights the differences in depth information between the target and the background, so more attention is paid to interesting objects. Finally, we design an end-to-end monocular 3D object detection network with proposed DDCDC convolution modules. As a demonstration of the effectiveness of our method, our module is validated on two datasets: KITTI and nuScenes. The DDCDC achieves the most significant improvement in a simple setup compared to existing methods. Our evaluation results for the KITTI split1/split2 set are 23.83/21.48,16.00/13.92,12.04/10.59 (based on easy, medium, and hard), while the results for the nuScenes test set are mAP = 0.364 and NDS = 0.434.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call