Abstract

Monocular depth estimation is a challenging problem in computer vision and is crucial for understanding 3D scene geometry. Recently, deep convolutional neural networks (DCNNs) based methods have improved the estimation accuracy significantly. However, existing methods fail to consider complex textures and geometries in scenes, thereby resulting in loss of local details, distorted object boundaries, and blurry reconstruction. In this paper, we proposed an end-to-end multi-scale residual pyramid attention network (MRPAN) to mitigate these problems. First, we propose a multi-scale attention context aggregation (MACA) module, which consists of spatial attention module (SAM) and global attention module (GAM). By considering the position and scale correlation of pixels from spatial and global perspectives, the proposed module can adaptively learn the similarity between pixels so as to obtain more global context information of the image and recover complex structures in the scene. Then we proposed an improved residual refinement module (RRM) to further refine the scene structure, giving rise to deeper semantic information and retain more local details. Experimental results show that our method achieves more promising performance in object boundaries and local details compared with other state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.