Joint Soft-Hard Attention for Self-Supervised Monocular Depth Estimation.

Chao Fan,Feiqing Zhang,Fulong Xu,Zhenyu Yin,Anying Chai

doi:10.3390/s21216956

Abstract

In recent years, self-supervised monocular depth estimation has gained popularity among researchers because it uses only a single camera at a much lower cost than the direct use of laser sensors to acquire depth. Although monocular self-supervised methods can obtain dense depths, the estimation accuracy needs to be further improved for better applications in scenarios such as autonomous driving and robot perception. In this paper, we innovatively combine soft attention and hard attention with two new ideas to improve self-supervised monocular depth estimation: (1) a soft attention module and (2) a hard attention strategy. We integrate the soft attention module in the model architecture to enhance feature extraction in both spatial and channel dimensions, adding only a small number of parameters. Unlike traditional fusion approaches, we use the hard attention strategy to enhance the fusion of generated multi-scale depth predictions. Further experiments demonstrate that our method can achieve the best self-supervised performance both on the standard KITTI benchmark and the Make3D dataset.

Highlights

It is essentially an ill-posed problem to obtain depth from an RGB image, recent studies have proved that convolutional neural networks (CNNs) can estimate depth by learning the hidden depth clues from the RGB images
As with most previous papers, we used the KITTI benchmark for training and evaluation
Since self-supervised training based on monocular video has no scale information, we have reported the results using the per-image median ground-truth scaling [7]

Summary

Introduction

There are various methods of obtaining depth information, such as laser sensors, or binocular or multi-cameras. These might be not available in some cases due to high costs or high environmental demands. Monocular images or videos are more common in most scenes This has led to the rapid development of monocular depth estimation as only one single camera is required. Self-supervised learning methods [7,8,9,10] construct new supervised signals by using depth predictions as intermediate variables through geometric constraints of monocular video or stereo images

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors	Publication Date: Oct 20, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Joint Soft-Hard Attention for Self-Supervised Monocular Depth Estimation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

Multiple prior representation learning for self-supervised monocular depth estimation via hybrid transformer
Guodong Sun ... Yang Zhang
Engineering Applications of Artificial Intelligence | VOL. 135
Guodong Sun, et. al.Guodong Sun ... Yang Zhang
13 Jun 2024
Engineering Applications of Artificial Intelligence | VOL. 135

Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics
Arnav Varma ... Hemang Chawla
-
Arnav Varma, et. al.Arnav Varma ... Hemang Chawla
01 Jan 2021
01 Jan 2021

MDSNet: self-supervised monocular depth estimation for video sequences using self-attention and threshold mask
Jiaqi Zhao ... Chunling Liu
Journal of Electronic Imaging | VOL. 31
Jiaqi Zhao, et. al.Jiaqi Zhao ... Chunling Liu
14 Sep 2022
Journal of Electronic Imaging | VOL. 31

MGNet: Monocular Geometric Scene Understanding for Autonomous Driving
Markus Schon ... Michael Buchholz
-
Markus Schon, et. al.Markus Schon ... Michael Buchholz
01 Oct 2021
01 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Joint Soft-Hard Attention for Self-Supervised Monocular Depth Estimation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors