Abstract

Monocular depth estimation has become one of the most studied topics in computer vision. Most approaches treat depth prediction as a fully supervised regression problem requiring vast amounts of corresponding ground-truth depth and image pairs for training. Unsupervised monocular depth estimation has emerged as a promising alternative that eliminates dataset limitations. This paper proposes an end-to-end unsupervised deep learning framework integrating attention blocks and multi-warp loss for monocular depth estimation. In this framework, to explore more general contextual information among the feature volumes, an attention block that sequentially refines the feature maps along the channel and spatial dimensions is inserted after the first and last stages of the network encoder. Additionally, to further utilize the errors in the original disparity estimation from the network, a novel multi-warp reconstruction strategy is designed for the loss function. The experimental results evaluated on the KITTI, CityScapes and Make3D datasets demonstrate the state-of-the-art performance and satisfactory generalization ability of our proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call