Abstract
We present an unsupervised learning method for the task of monocular depth estimation. In common with many recent works, we leverage convolutional neural network (CNN) training on stereo pair images with view reconstruction as a self-supervisory signal. In contrast to the previous work, we employ a stereo camera parameters estimation network to make our model robust to training data diversity. Another of our contributions is the introduction of self-supervision correction. With it we address one of the serious drawbacks of the stereo pair self-supervision in the unsupervised monocular depth estimation approach: at later training stages, self-supervision by view reconstruction fails to improve predicted depth map due to various ambiguities in the input images. We mitigate this problem by making depth estimation CNN produce both depth map and correction map used to modify the input stereo pair images in the areas of ambiguity. Our contributions allow us to achieve state-of-the-art results on the KITTI driving dataset (among unsupervised methods) by training our model on hybrid city driving dataset.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.