Abstract

Current monocular 3D object detection algorithms generally suffer from inaccurate depth estimation, which leads to reduction of detection accuracy. The depth error from image-to-image generation for the stereo view is insignificant compared with the gap in single-image generation. Therefore, a novel pseudo-monocular 3D object detection framework is proposed, which is called Pseudo-Mono. Particularly, stereo images are brought into monocular 3D detection. Firstly, stereo images are taken as input, then a lightweight depth predictor is used to generate the depth map of input images. Secondly, the left input images obtained from stereo camera are used as subjects, which generate enhanced visual feature and multi-scale depth feature by depth indexing and feature matching probabilities, respectively. Finally, sparse anchors set by the foreground probability maps and the multi-scale feature maps are used as reference points to find the suitable initialization approach of object query. The encoded visual feature is adopted to enhance object query for enabling deep interaction between visual feature and depth feature. Compared with popular monocular 3D object detection methods, Pseudo-Mono is able to achieve richer fine-grained information without additional data input. Extensive experimental results on the datasets of KITTI, NuScenes, and MS-COCO demonstrate the generalizability and portability of the proposed method. The effectiveness and efficiency of Pseudo-Mono have been demonstrated by extensive ablation experiments. Experiments on a real vehicle platform have shown that the proposed method maintains high performance in complex real-world environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call