Simultaneous Monocular Visual Odometry and Depth Reconstruction with Scale Recover

Yong Luo,Guoliang Liu,Ze Ji,Hanjie Liu,Guohui Tian,Tiantian Liu

doi:10.1109/robio49542.2019.8961701

Abstract

In this paper, we propose a deep neural network that can estimate camera poses and reconstruct the full resolution depths of the environment simultaneously using only monocular consecutive images. In contrast to traditional monocular visual odometry methods, which cannot estimate scaled depths, we here demonstrate the recovery of the scale information using a sparse depth image as a supervision signal in the training step. In addition, based on the scaled depth, the relative poses between consecutive images can be estimated using the proposed deep neural network. Another novelty lies in the deployment of view synthesis, which can synthesize a new image of the scene from a different view (camera pose) given an input image. The view synthesis is the core technique used for constructing a loss function for the proposed neural network, which requires the knowledge of the predicted depths and relative poses, such that the proposed method couples the visual odometry and depth prediction together. In this way, both the estimated poses and the predicted depths from the neural network are scaled using the sparse depth image as the supervision signal during training. The experimental results on the KITTI dataset show competitive performance of our method to handle challenging environments.

Full Text