Abstract

This paper proposes an unsupervised learning framework for monocular depth estimation and visual odometry (VO), referred to as DVONet. The framework is trained using stereo image sequences and is able to estimate absolute-scale scene depth and camera poses from monocular images. To mitigate the effect of stereo occlusions in training and improve the depth estimation, left-right occlusion mask is introduced. In addition, a novel VO network is proposed where the feature extraction network is shared between pose estimation and optical flow estimation. The proposed DVONet achieves state-of-the-art results for both depth estimation and VO tasks on the KITTI driving dataset, outperforming the existing unsupervised methods and being comparable to the traditional ones.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call