Abstract

Depth estimation is crucial to understanding the geometry of a scene in robotics and computer vision. Traditionally, depth estimators can be trained with various forms of self-supervised stereo data or supervised ground-truth data. In comparison to the methods that utilize stereo depth perception or ground-truth data from laser scans, determining depth relation using an unlabeled monocular camera proves considerably more challenging. Recent work has shown that CNN-based depth estimators can be learned using unlabeled monocular video. Without needing the stereo data or ground-truth depth data, learning with monocular self-supervised strategies can utilize much larger and more varied image datasets. Inspired by recent advances in depth estimation, in this paper, we propose a novel objective that replaces the use of explicit ground-truth depth or binocular stereo depth with unlabeled monocular video sequence data. No assumptions about scene geometry or pre-trained information are used in the proposed architecture. To enable a better pose prediction, we propose the use of an improved differentiable direct visual odometry (DDVO), which is fused with an appearance-matching loss. The auto-masking approach is introduced in the DDVO depth predictor to filter out the low-texture area or occlusion area, which can easily reduce matching error, from one frame to the subsequent frame in the monocular sequence. Additionally, we introduce a self-supervised loss function to fuse the auto-masking segment and the depth-prediction segment accordingly. Our method produces state-of-the-art results for monocular depth estimation on the KITTI driving dataset, even outperforming some supervised methods that have been trained with ground-truth depth.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.