Unsupervised Learning of Depth from Monocular Videos Using 3D-2D Corresponding Constraints

Fusheng Jin,Ye Yuan,Shuliang Wang,Chuanbing Wan,Yu Zhao

doi:10.3390/rs13091764

Abstract

Depth estimation can provide tremendous help for object detection, localization, path planning, etc. However, the existing methods based on deep learning have high requirements on computing power and often cannot be directly applied to autonomous moving platforms (AMP). Fifth-generation (5G) mobile and wireless communication systems have attracted the attention of researchers because it provides the network foundation for cloud computing and edge computing, which makes it possible to utilize deep learning method on AMP. This paper proposes a depth prediction method for AMP based on unsupervised learning, which can learn from video sequences and simultaneously estimate the depth structure of the scene and the ego-motion. Compared with the existing unsupervised learning methods, our method makes the spatial correspondence among pixel points consistent with the image area by smoothing the 3D corresponding vector field based on 2D image, which effectively improves the depth prediction ability of the neural network. Our experiments on the KITTI driving dataset demonstrated that our method outperformed other previous learning-based methods. The results on the Apolloscape and Cityscapes datasets show that our proposed method has a strong universality.

Highlights

We propose a depth prediction method for autonomous moving platforms (AMP) based on unsupervised learning, which can learn from video sequences and simultaneously estimate the depth structure of the scene and the ego-motion
Since the depth of the object in the image cannot be directly calculated, the simultaneous localization and mapping (SLAM) methods based on monocular camera need to first analyze the relationship of the position of the same points in different views, and estimate the ego-motion and the depth of the points in the scene simultaneously for example [6,7]
To be able to evaluate in the KITTI dataset, we needed to map the laser measurements into the graph space to generate the ground-truth depth corresponding to the original image

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. We propose a deep prediction method based on unsupervised learning This method trains the neural network by analyzing the geometric constraint relationship of the three-dimensional (3D) scene among sequence of pictures and constraining the correspondence of pixels among images according to image gradient. It can simultaneously predict the depth structure of the scene and the ego-motion. Our method makes the spatial correspondence between pixel points consistent with the image area by smoothing the 3D corresponding vector field by 2D image This effectively improves the depth prediction ability of the neural network. The results of the assessment indicate that our unsupervised method is superior to existing methods of the same type and has better quality than other self-supervised and supervised methods in recent years

Related Work

The Proposed Approach

Differentiable Reprojection Error

Image Reconstruction Loss

Corresponding Consistency Loss

Learning Setup

Network Structure

Datasets Description

Experiment Settings

Comparisons with Other Methods

Evaluation of Depth Estimation

Evaluation of Ego-Motion

Depth Results on Apollo and Cityscapes

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Remote Sensing	Publication Date: May 1, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Unsupervised Learning of Depth from Monocular Videos Using 3D-2D Corresponding Constraints

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Remote Sensing

Lead the way for us

Similar Papers

Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints
Reza Mahjourian ... Martin Wicke
-
Reza Mahjourian, et. al.Reza Mahjourian ... Martin Wicke
01 Jun 2018
01 Jun 2018

Unsupervised Learning of Depth and Pose Estimation based on Continuous Frame Window
Suning Shang ... Huaimin Wang
-
Suning Shang, et. al.Suning Shang ... Huaimin Wang
01 Jul 2018
01 Jul 2018

APAC-Net: Unsupervised Learning of Depth and Ego-Motion from Monocular Video
Rui Lin ... Guangming Lu
-
Rui Lin, et. al.Rui Lin ... Guangming Lu
01 Jan 2019
01 Jan 2019

Automatic Recognition and Repair System of Mural Image Cracks Based on Cloud Edge Computing and Digitization
Yongli Gao ... Muhammad Zakarya
Mobile Information Systems | VOL. 2022
Yongli Gao, et. al.Yongli Gao ... Muhammad Zakarya
10 Oct 2022
Mobile Information Systems | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised Learning of Depth from Monocular Videos Using 3D-2D Corresponding Constraints

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Remote Sensing