Abstract

Monocular 3D object detection aims to localize objects in 3D space from a single image. This is a difficult problem due to the lack of accurate depth measurements. Many methods predict depths upfront using a pre-trained vision-based depth estimator to assist in 3D object detection. However, the methods show limited improved performances because of the depth inaccuracy and the neglect of depth confidence. In this paper, we propose a new end-to-end 3D object detection framework by combining a monocular camera with a cheap 4-beam LiDAR. The minimal LiDAR signal is leveraged as an additional input to predict high-quality and dense depth maps from monocular images. Meanwhile, several 3D proposals are generated by a keypoint-based detector. The key challenge is encoding the depth confidence to capture the depth estimation uncertainty. Therefore, we propose probabilistic instance shape reconstruction to exploit instance shape information for box refinement. Our method employs a fully differentiable end-to-end framework, making it simple and efficient. The experimental results on the KITTI dataset demonstrate that the proposed method achieves state-of-the-art performance, thus validating the effectiveness of the sparse LiDAR data and the probabilistic instance shape reconstruction. Code is available at https://github.com/xjtuwh/SparseLiDAR_fusion.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.