MDSNet: self-supervised monocular depth estimation for video sequences using self-attention and threshold mask

Jiaqi Zhao,Chunling Liu,Wang Zhang,Chaoyue Zhao,Chaojian Zhang

doi:10.1117/1.jei.31.5.053013

Abstract

Solving the problem that features prediction of self-supervised monocular depth estimation is still ambiguous in low-texture regions and boundaries. We proposed an innovative self-supervised monocular depth estimation method, monocular depth self-supervised network, which integrates three effective strategies to construct an innovative self-supervised monocular depth estimation framework: (1) the attention mechanism and feature fusion module are adopted to enhance the semantic and spatial information of feature images, (2) the threshold segmentation mask is utilized to solve object motion and low-texture regions to increase image details, and (3) the residual pose module and deep reconstruction loss are used to enhance the feature extraction capability of the model to improve the accuracy of depth and pose estimation. Comprehensive experiments and visual analysis results demonstrate the effectiveness of each component in isolation. Compared to existing self-supervised methods, our model not only achieves outstanding results on KITTI and NYU Depth V2 datasets but also can be suitable to different environments.

Full Text