Abstract

We propose UnLearnerVO, a jointly unsupervised learning framework for monocular depth, camera motion estimation from videos. UnLearnerVO is coupled with the relationships of 3D scene geometry and can estimate the 6-DoF pose of a monocular camera in an end-to-end pattern. There are two significant features of the proposed UnLearnerVO: one is an unsupervised depth learning pipeline based on the consecutive /inconsecutive frames, and the other is robustness in a scenario with large camera motion. Specifically, we deeply excavate the pose loop consistency loss, thus optimizing the camera pose and enforcing consistency of the estimated poses across consecutive and inconsecutive frames. Furthermore, a photometric loop consistency loss is proposed, which reduces the disturbance caused by factors such as the dynamic motion object and the photo inconsistency. The experiments on KITTI datasets show that our UnLearnerVO achieves the state-of-the-art results in large camera motion scenarios and performs better than the currently popular unsupervised approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call