Abstract

At present, the accuracy of unsupervised monocular depth estimation method is not high, the outline is not clear. To solve this problem, we propose a jointly unsupervised learning framework for monocular depth and camera motion estimation from video sequences in this work. Specifically, we introduce an Atrous Spatial Pyramid Pooling (ASPP) module and an attention model. The former module is able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter model enables the network to maintain the shape of objects and enhance edges of the depth map. Experiments on KITTI and Cityscapes datasets show that our method can effectively improve the accuracy of the monocular depth estimation, solve the depth estimation boundary blur problem and preserve the details of the depth map.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call