Abstract
Current monocular 3D object detection algorithms generally suffer from inaccurate depth estimation, which leads to reduction of detection accuracy. The depth error from image-to-image generation for the stereo view is insignificant compared with the gap in single-image generation. Therefore, a novel pseudo-monocular 3D object detection framework is proposed, which is called Pseudo-Mono. Particularly, stereo images are brought into monocular 3D detection. Firstly, stereo images are taken as input, then a lightweight depth predictor is used to generate the depth map of input images. Secondly, the left input images obtained from stereo camera are used as subjects, which generate enhanced visual feature and multi-scale depth feature by depth indexing and feature matching probabilities, respectively. Finally, sparse anchors set by the foreground probability maps and the multi-scale feature maps are used as reference points to find the suitable initialization approach of object query. The encoded visual feature is adopted to enhance object query for enabling deep interaction between visual feature and depth feature. Compared with popular monocular 3D object detection methods, Pseudo-Mono is able to achieve richer fine-grained information without additional data input. Extensive experimental results on the datasets of KITTI, NuScenes, and MS-COCO demonstrate the generalizability and portability of the proposed method. The effectiveness and efficiency of Pseudo-Mono have been demonstrated by extensive ablation experiments. Experiments on a real vehicle platform have shown that the proposed method maintains high performance in complex real-world environments.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Circuits and Systems for Video Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.