Self-Supervised 3D Reconstruction and Ego-Motion Estimation Via On-Board Monocular Video

Shaocheng Jia,Danya Yao,Xin Pei,Xiao Jing

doi:10.1109/tits.2021.3071428

Abstract

Recovering the three-dimensional structure information from a monocular camera is significant for automated driving, robot navigation, and traffic safety assessment. Recent work has solved various tight issues on self-supervised monocular depth estimation of leveraging on-board videos, such as occlusion/disocclusion, dynamic objects, and scale inconsistent. Nevertheless, rare work focuses on the model’s prediction confidence and underlying relations between depths, while they are essential for a decision-making system and performance improvement, respectively. This paper proposes a novel scheme, that of correlation-aware structure, to dig into the relations between depths, converting the independent depths into a graph-like connected depth map. Subsequently, a Gaussian estimator is devised to predict the depth map and uncertainty map concurrently. The uncertainty map can show us problematic regions where it is difficult to predict, from which we further develop uncertainty-based strategies to improve the performance. Specifically, we propose a simple image preprocessing method to overcome the gradient locality issue caused by low-texture, especially in smooth roads and shadows. Also, to avoid the influence of high-uncertainty regions, we propose a solidity-aware mask to recognize the reliable pixels for training in the image. The experiments on the KITTI dataset show that our method results in a competitive performance in both depth and ego-motion estimation tasks compared with the state-of-the-art methods. Besides, additional experiments on the Make3D and Cityscapes datasets demonstrate our method’s strong generalization capability and practicality.

Full Text