Abstract

Predicting depth information from a single image has recently become an important research topic in computer vision. In particular, the self-supervised strategy for learning the depth is more attractive because it is not necessary to label any ground truth information. Under the framework of self-supervised learning we propose a CA-depth network to improve the accuracy of a single image depth estimation. We added the attention mechanism to the monocular depth estimation network to address the issues of observable artifacts and inaccurate prediction geometry in monocular depth estimation images. The spatial position information in the high-dimensional feature map is used to pay attention to the essential features, and to weaken the artifact phenomenon in the depth prediction map. We used Resnet as the encoder to extract the input image's feature map, the coordinate attention mechanism to realize the optimal allocation of convolution feature map weight, and the decoding network structure to predict the depth. Experimental results on public datasets show that the depth prediction accuracy of the CA-depth network is higher than the state-of-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call