Occlusion-Aware Unsupervised Learning of Monocular Depth, Optical Flow and Camera Pose with Geometric Constraints

Qianru Teng,Yimin Chen,Chen Huang

doi:10.3390/fi10100092

Abstract

We present an occlusion-aware unsupervised neural network for jointly learning three low-level vision tasks from monocular videos: depth, optical flow, and camera motion. The system consists of three different predicting sub-networks simultaneously coupled by combined loss terms and is capable of computing each task independently on test samples. Geometric constraints extracted from scene geometry which have traditionally been used in bundle adjustment or pose-graph optimization are formed as various self-supervisory signals during our end-to-end learning approach. Different from prior works, our image reconstruction loss also takes account of optical flow. Moreover, we impose novel 3D flow consistency constraints over the predictions of all the three tasks. By explicitly modeling occlusion and taking utilization of both 2D and 3D geometry relationships, abundant geometric constraints are formed over estimated outputs, enabling the system to capture both low-level representations and high-level cues to infer thinner scene structures. Empirical evaluation on the KITTI dataset demonstrates the effectiveness and improvement of our approach: (1) monocular depth estimation outperforms state-of-the-art unsupervised methods and is comparable to stereo supervised ones; (2) optical flow prediction ranks top among prior works and even beats supervised and traditional ones especially in non-occluded regions; (3) pose estimation outperforms established SLAM systems under comparable input settings with a reasonable margin.

Highlights

The ability to perceive visual environment of one agent is more required than ever due to the advancing expand in industries such as robotics [1], autonomous driving [2] and augmented reality [3], among which has many crucial tasks to fulfill including reasoning one’s ego-motion and the scene structure
Structure from motion (SfM) [4,5,6] is a long-standing task in computer vision which aims at reconstructing camera motion and scene structure, but often hard to integrate reasonable priors for small camera translation or outliers from low-level sparse correspondences
The main idea of this paper is by taking account of optical flow simultaneously in addition to depth and camera motion estimation, jointly learning a network with both 2D and 3D geometry transformations during training to create additional constraints for better exploitation of nature geometry structure

Summary

Introduction

The ability to perceive visual environment of one agent is more required than ever due to the advancing expand in industries such as robotics [1], autonomous driving [2] and augmented reality [3], among which has many crucial tasks to fulfill including reasoning one’s ego-motion and the scene structure. Visual Simultaneous localization and mapping (VSLAM) [7,8,9] use handcrafted sparse or dense abstractions to represent geometry and limited to a particular kind of scenes They suffer from absolute scale regarding monocular approach and noisy data regarding depth sensor. The main idea of this paper is by taking account of optical flow simultaneously in addition to depth and camera motion estimation, jointly learning a network with both 2D and 3D geometry transformations during training to create additional constraints for better exploitation of nature geometry structure. Learn an unsupervised deep neural network from monocular videos that predicts depth, optical flow, and camera motion simultaneously at training time coupled by combined loss term. The 3D constraint contains two part: 3D point alignment loss and a novel 3D flow loss

Related Work

Deep Learning with Geometry for Scene Understanding

Supervised Videos Learning

Unsupervised Videos Learning

Method

System Overview

Bidirectional Flow Loss as Occlusion Mask

Mutually Supervised 3D Geometric Consistency Loss

Smoothness Constraint

Network Architecture

Training Details

Depth Evaluation

Optical Flow Evaluation

Pose Evaluation

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future Internet	Publication Date: Sep 21, 2018
Citations: 15	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Occlusion-Aware Unsupervised Learning of Monocular Depth, Optical Flow and Camera Pose with Geometric Constraints

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Internet

Lead the way for us

Similar Papers

Self-Supervised Learning of Optical Flow, Depth, Camera Pose and Rigidity Segmentation with Occlusion Handling
Rokia Abdein ... Ning Lv
-
Rokia Abdein, et. al.Rokia Abdein ... Ning Lv
16 Oct 2022
16 Oct 2022

Self-Supervised Learning With Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera
Yuhua Chen ... Cristian Sminchisescu
-
Yuhua Chen, et. al.Yuhua Chen ... Cristian Sminchisescu
01 Oct 2019
01 Oct 2019

Joint Unsupervised Learning of Optical Flow and Egomotion with Bi-Level optimization
Shihao Jiang ... Richard Hartley
-
Shihao Jiang, et. al.Shihao Jiang ... Richard Hartley
01 Nov 2020
01 Nov 2020

Geometry understanding from autonomous driving scenarios based on feature refinement
Mingliang Zhai ... Xuezhi Xiang
Neural Computing and Applications | VOL. 33
Mingliang Zhai, et. al.Mingliang Zhai ... Xuezhi Xiang
22 Jul 2020
Neural Computing and Applications | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Occlusion-Aware Unsupervised Learning of Monocular Depth, Optical Flow and Camera Pose with Geometric Constraints

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Internet