Abstract

Depth estimation is a classic computer vision task and provides rich representation of objects and environment. In recent years, the performance of end-to-end depth estimation has been significantly improved. However, the stack of convolutions and pooling operations result in losing local detail spatial information, which is extremely important to monocular depth estimation. In order to overcome this problem, in this work, we propose an encoder-decoder framework with skip connections. Based on the self-attention mechanism, we apply the channel-spatial attention module as a transition layer, which captures the depth and spatial positional relationship and improves the presentation ability of channel and space. Then we propose a dense decoding module to make full use of the attention features of different scale ranges in the decoding process. It achieves a more massive and denser receptive field while obtaining multi-scale information. Finally, a novel distance-aware loss is introduced to predict more meticulous edges and local details in the distance. Experiments demonstrate that the proposed method outperforms the state-of-the-art on KITTI and NYU Depth V2 datasets.

Highlights

  • Depth information is vital for real-world 3D scenes, and depth estimation is an essential part of understanding the geometric relationship in the scenes

  • We propose a dense decoding module (DDM), which cascades multiple dilated convolutional layers and adds all previous features as inputs in the way of dense up-sampling

  • Based on the self-attention mechanism [33], [34], we introduce the spatial attention module to more accurately characterize the contours of objects of different depths by modeling the relationship between any point and global feature

Read more

Summary

Introduction

Depth information is vital for real-world 3D scenes, and depth estimation is an essential part of understanding the geometric relationship in the scenes. It can enhance many recognition tasks and have full applications in multimodal emotion recognition, 3D reconstruction, simultaneous localization and mapping (SLAM), 3D object detection and autonomous driving. Deep convolutional neural networks have made great progress in image classification [3], semantic segmentation [4] and object detection [5]. More and more scholars apply a deep convolutional neural network to monocular image depth estimation [6]–[9]

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.