Abstract

In this paper, we propose a deep neural network that can estimate camera poses and reconstruct the full resolution depths of the environment simultaneously using only monocular consecutive images. In contrast to traditional monocular visual odometry methods, which cannot estimate scaled depths, we here demonstrate the recovery of the scale information using a sparse depth image as a supervision signal in the training step. In addition, based on the scaled depth, the relative poses between consecutive images can be estimated using the proposed deep neural network. Another novelty lies in the deployment of view synthesis, which can synthesize a new image of the scene from a different view (camera pose) given an input image. The view synthesis is the core technique used for constructing a loss function for the proposed neural network, which requires the knowledge of the predicted depths and relative poses, such that the proposed method couples the visual odometry and depth prediction together. In this way, both the estimated poses and the predicted depths from the neural network are scaled using the sparse depth image as the supervision signal during training. The experimental results on the KITTI dataset show competitive performance of our method to handle challenging environments.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.