Abstract

Simultaneous localization and map construction (SLAM) tasks have been proven to benefit greatly from the depth information of the environment. In this paper, we first present an unsupervised end-to-end learning framework for the task of monocular depth and camera motion estimation from video sequences. The difference between our work and the existing unsupervised methods is that we not only use image reconstruction for supervising but also exploit the pose estimation method used in traditional SLAM approaches to enhance the supervised signal and add extra training constraints for the task of monocular depth and camera motion estimation. Furthermore, we successfully exploit our unsupervised learning framework to assist the traditional ORB-SLAM system when the initialization module of ORB-SLAM method could not match enough features. Qualitative and quantitative experiments have shown that our unsupervised learning framework performs the depth estimation task superior to the supervised methods and outperforms the previous state-of-the-art unsupervised approach by 13.5% on KITTI dataset. For the pose estimation task, our method performs comparably to the supervised methods that use ground-truth pose data for training. Besides, our unsupervised learning framework can significantly accelerate the initialization process of the traditional ORB-SLAM system and effectively improve the accuracy of environmental mapping in strong lighting and weak texture scenes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call