Unsupervised Learning for Depth, Ego-Motion, and Optical Flow Estimation Using Coupled Consistency Conditions

Ji-Hun Mun,Byung-Geun Lee,Moongu Jeon

doi:10.3390/s19112459

Abstract

Herein, we propose an unsupervised learning architecture under coupled consistency conditions to estimate the depth, ego-motion, and optical flow. Previously invented learning techniques in computer vision adopted a large amount of the ground truth dataset for network training. A ground truth dataset, including depth and optical flow collected from the real world, requires tremendous effort in pre-processing due to the exposure to noise artifacts. In this paper, we propose a framework that trains networks while using a different type of data with combined losses that are derived from a coupled consistency structure. The core concept is composed of two parts. First, we compare the optical flows, which are estimated from both the depth plus ego-motion and flow estimation network. Subsequently, to prevent the effects of the artifacts of the occluded regions in the estimated optical flow, we compute flow local consistency along the forward–backward directions. Second, synthesis consistency enables the exploration of the geometric correlation between the spatial and temporal domains in a stereo video. We perform extensive experiments on the depth, ego-motion, and optical flow estimation on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset. We verify that the flow local consistency loss improves the optical flow accuracy in terms of the occluded regions. Furthermore, we also show that the view-synthesis-based photometric loss enhances the depth and ego-motion accuracy via scene projection. The experimental results exhibit the competitive performance of the estimated depth and the optical flow; moreover, the induced ego-motion is comparable to that obtained from other unsupervised methods.

Highlights

Estimating accurate scene depth, ego-motion, and optical flow is a challenging issue in autonomous driving and robotics
We evaluated our method on the KITTI dataset [23], which included prior works of depth, odometry, and optical flow
We evaluated the performance of our depth estimation network with other state-of-the-art approaches while using the split KITTI dataset

Summary

Introduction

Estimating accurate scene depth, ego-motion, and optical flow is a challenging issue in autonomous driving and robotics. These properties are important in computer vision. Some components have extensive industrial applications, such as intelligence robotics [1] and simultaneous localization and mapping (SLAM) [2]. The creation of a model for real-world scene reconstruction encountered challenges of non-rigidity, occlusion, and light reflectance in past studies. Reconstructing a relevant model, despite these obstacles, depends on visual experiences, such as motion and the shapes of some specific objects. We require a precisely predicted depth, because it provides crucial information to computer vision applications, such as driving assistance, object tracking, and three-dimensional (3D) reconstruction.

Objectives

Methods

Results

Conclusion