Depth Prediction without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

Vincent Casser,Anelia Angelova,Soeren Pirk,Reza Mahjourian

doi:10.1609/aaai.v33i01.33018001

Abstract

Learning to predict scene depth from RGB inputs is a challenging task both for indoor and outdoor robot navigation. In this work we address unsupervised learning of scene depth and robot ego-motion where supervision is provided by monocular videos, as cameras are the cheapest, least restrictive and most ubiquitous sensor for robotics. Previous work in unsupervised image-to-depth learning has established strong baselines in the domain. We propose a novel approach which produces higher quality results, is able to model moving objects and is shown to transfer across data domains, e.g. from outdoors to indoor scenes. The main idea is to introduce geometric structure in the learning process, by modeling the scene and the individual objects; camera ego-motion and object motions are learned from monocular videos as input. Furthermore an online refinement method is introduced to adapt learning on the fly to unknown domains. The proposed approach outperforms all state-of-the-art approaches, including those that handle motion e.g. through learned flow. Our results are comparable in quality to the ones which used stereo as supervision and significantly improve depth prediction on scenes and datasets which contain a lot of object motion. The approach is of practical relevance, as it allows transfer across environments, by transferring models trained on data collected for robot navigation in urban scenes to indoor navigation settings. The code associated with this paper can be found at https://sites.google.com/view/struct2depth.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Depth Prediction without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jul 17, 2019
Citations: 420

Similar Papers

Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras
Ariel Gordon ... Hanhan Li
-
Ariel Gordon, et. al.Ariel Gordon ... Hanhan Li
01 Oct 2019
01 Oct 2019

Unsupervised Learning of Depth from Monocular Videos Using 3D-2D Corresponding Constraints
Fusheng Jin ... Chuanbing Wan
Remote Sensing | VOL. 13
Fusheng Jin, et. al.Fusheng Jin ... Chuanbing Wan
01 May 2021
Remote Sensing | VOL. 13

Spatial Correspondence With Generative Adversarial Network: Learning Depth From Monocular Videos
Zhenyao Wu ... Xiaoping Zhang
-
Zhenyao Wu, et. al.Zhenyao Wu ... Xiaoping Zhang
01 Oct 2019
01 Oct 2019

Unsupervised learning of depth estimation, camera motion prediction and dynamic object localization from video
Delong Yang ... Xiafu Peng
International Journal of Advanced Robotic Systems | VOL. 17
Delong Yang, et. al.Delong Yang ... Xiafu Peng
01 Mar 2020
International Journal of Advanced Robotic Systems | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Depth Prediction without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence