CATNet: Convolutional attention and transformer for monocular depth estimation

Shuai Tang,Tongwei Lu,Xuanxuan Liu,Huabing Zhou,Yanduo Zhang

doi:10.1016/j.patcog.2023.109982

Abstract

Monocular depth estimation has received more and more attention due to its wide range of application scenarios. In this paper, we propose a novel simple framework, called CATNet, which treats monocular depth estimation as an ordinal regression problem. At present, in order to obtain higher performance, the research on monocular depth estimation is achieved by increasing the amount of calculation and parameters of the model. Based on this, we propose a novel simple encoder–decoder architecture that aims to reduce the SOTA model parameters and complexity while keeping the depth estimation accuracy as high as possible rather than aiming for extremely lightweight. Meanwhile, in order to further refine the multi-scale information extracted by the encoder, we propose a Multi-dimensional Convolutional Attention (MCA) module. To enhance the extraction of global information for accurate pixel classification, we propose a Dual Attention Transformer (DAT) module to extract global features of images. Furthermore, experimental results on the KITTI and NYU datasets demonstrate that the advantage of our proposed framework is that it achieves almost equivalent depth estimation performance to the current SOTA with fewer parameters and lower complexity. To the best of our knowledge, CATNet is the first work that achieves nearly the same depth estimation accuracy as Transformer-based large model encoders with so few parameters.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CATNet: Convolutional attention and transformer for monocular depth estimation

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition

Lead the way for us

Journal: Pattern Recognition	Publication Date: Sep 19, 2023
Citations: 5

Similar Papers

Unsupervised Learning of Depth Estimation Based on Attention Model from Monocular Images
Chao Zhang ... Jiaqi Liu
-
Chao Zhang, et. al.Chao Zhang ... Jiaqi Liu
01 Nov 2020
01 Nov 2020

Self-supervised learning of monocular depth using quantized networks
Keyu Lu ... Yonghu Zeng
Neurocomputing | VOL. 488
Keyu Lu, et. al.Keyu Lu ... Yonghu Zeng
06 Dec 2021
Neurocomputing | VOL. 488

Monocular depth estimation using self-supervised learning with more effective geometric constraints
Mingkang Xiong ... Huilin Xiong
Engineering Applications of Artificial Intelligence | VOL. 128
Mingkang Xiong, et. al.Mingkang Xiong ... Huilin Xiong
15 Nov 2023
Engineering Applications of Artificial Intelligence | VOL. 128

AMENet is a monocular depth estimation network designed for automatic stereoscopic display.
Tianzhao Wu ... Zengyuan Chen
Scientific Reports | VOL. 14
Tianzhao Wu, et. al.Tianzhao Wu ... Zengyuan Chen
11 Mar 2024
Scientific Reports | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CATNet: Convolutional attention and transformer for monocular depth estimation

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition