Abstract

Monocular depth estimation is a classical but challenging task in the field of computer vision. In recent years, Convolutional Neural Network (CNN) based models have been developed to estimate high-quality depth map from a single image. Most recently, some Transformer based models have led to great improvements. All the researchers are looking for a better way to handle the global processing of information which is crucial for depth relation inference but of high computational complexity. In this paper, we take advantage of both the Transformer and CNN then propose a novel network architecture, called Rich Global Feature Guided Network (RGFN), with which rich global features are extracted from both encoder and decoder. The framework of the RGFN is the typical encoder-decoder for dense prediction. A hierarchical transformer is implemented as the encoder to capture multi-scale contextual information and model long-range dependencies. In the decoder, the Large Kernel Convolution Attention (LKCA) is adopted to extract global features from different scales and guide the network to recover fine depth maps from low spatial resolution feature maps progressively. What's more, we apply the depth-specific data augmentation method, Vertical CutDepth, to boost the performance. Experimental results on both the indoor and outdoor datasets demonstrate the superiority of the RGFN compared to other state-of-the-art models. Compared with the most recent method AdaBins, RGFN improves the RMSE score by 4.66% on the KITTI dataset and 4.67% on the NYU Depth v2 dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.