TSD-Depth: Using transformers and self-distilling for self-supervised indoor depth estimation

Chen Lv,Chenggong Han,Junhui Chen,Deqiang Cheng,Jiansheng Qian

doi:10.1016/j.ijleo.2023.171219

Abstract

Supervised monocular depth estimation has always been one of the most important tasks in computer vision. With the convolution module as a basic operator, the U-shaped network architecture has become the de facto standard and has achieved tremendous success. However, due to the limited receptive field of the convolution operation, the CNNs are generally inferior in explicitly modeling the long-range dependencies. Originally proposed for natural language processing, the transformers are designed for performing sequence-to-sequence predictions based on global self-attention mechanism. Therefore, the transformers can capture long-range dependencies. However, they have limited localization abilities due to insufficient low-level details. In this work, we propose a TSD-Depth model, which merits both the transformers and the CNNs, as a strong alternative for self-supervised monocular depth estimation. The proposed model simultaneously extracts the global contextual information and local spatial detail features. Furthermore, by designing the hybrid encoder connection method and proper-sized transformer module, the global and local information can more effectively interact. In addition, a local multi-scale fusion block is first proposed to refine the fine-grained details. More importantly, the knowledge is learned by using self-distillation to skip the multi-scale fusion block concatenated with the encoder at the inference time, computed only during the training process for minimal overhead. The experimental results on NYU-v2 and ScanNet datasets show that the proposed TSD-Depth achieves the best performance as compared to the previous state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

TSD-Depth: Using transformers and self-distilling for self-supervised indoor depth estimation

Abstract

Talk to us

Similar Papers

More From: Optik

Lead the way for us

Similar Papers

Swin-Depth: Using Transformers and Multi-Scale Fusion for Monocular-Based Depth Estimation
Zeyu Cheng ... Yi Zhang
IEEE Sensors Journal | VOL. 21
Zeyu Cheng, et. al.Zeyu Cheng ... Yi Zhang
01 Dec 2021
IEEE Sensors Journal | VOL. 21

Attention-Based Monocular Depth Estimation Considering Global and Local Information in Remote Sensing Images
Junwei Lv ... Bin Lei
Remote Sensing | VOL. 16
Junwei Lv, et. al.Junwei Lv ... Bin Lei
04 Feb 2024
Remote Sensing | VOL. 16

Deep learning for monocular depth estimation: A review
Yue Ming ... Hui Yu
Neurocomputing | VOL. 438
Yue Ming, et. al.Yue Ming ... Hui Yu
05 Jan 2021
Neurocomputing | VOL. 438

Encoder-Decoder Structure with Multiscale Receptive Field Block for Unsupervised Depth Estimation from Monocular Video
Songnan Chen ... Mengxia Tang
Remote Sensing | VOL. 14
Songnan Chen, et. al.Songnan Chen ... Mengxia Tang
17 Jun 2022
Remote Sensing | VOL. 14

Journal: Optik	Publication Date: Jul 26, 2023
Citations: 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TSD-Depth: Using transformers and self-distilling for self-supervised indoor depth estimation

Abstract

Talk to us

Similar Papers

More From: Optik