GeometryFormer: Semi-Convolutional Transformer Integrated with Geometric Perception for Depth Completion in Autonomous Driving Scenes.

Siyuan Su,Jian Wu

doi:10.3390/s24248066

Siyuan Su, Jian Wu

Open Access

https://doi.org/10.3390/s24248066

Copy DOI

Export

Save

Cite

Journal: Sensors (Basel, Switzerland)	Publication Date: Dec 18, 2024
License type: CC BY 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

Depth completion is widely employed in Simultaneous Localization and Mapping (SLAM) and Structure from Motion (SfM), which are of great significance to the development of autonomous driving. Recently, the methods based on the fusion of vision transformer (ViT) and convolution have brought the accuracy to a new level. However, there are still two shortcomings that need to be solved. On the one hand, for the poor performance of ViT in details, this paper proposes a semi-convolutional vision transformer to optimize local continuity and designs a geometric perception module to learn the positional correlation and geometric features of sparse points in three-dimensional space to perceive the geometric structures in depth maps for optimizing the recovery of edges and transparent areas. On the other hand, previous methods implement single-stage fusion to directly concatenate or add the outputs of ViT and convolution, resulting in incomplete fusion of the two, especially in complex outdoor scenes, which will generate lots of outliers and ripples. This paper proposes a novel double-stage fusion strategy, applying learnable confidence after self-attention to flexibly learn the weight of local features. Our network achieves state-of-the-art (SoTA) performance with the NYU-Depth-v2 Dataset and the KITTI Depth Completion Dataset. It is worth mentioning that the root mean square error (RMSE) of our model on the NYU-Depth-v2 Dataset is 87.9 mm, which is currently the best among all algorithms. At the end of the article, we also verified the generalization ability in real road scenes.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

GeometryFormer: Semi-Convolutional Transformer Integrated with Geometric Perception for Depth Completion in Autonomous Driving Scenes.

Abstract

Published Version

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

Mapping with LIDAR and structure-from-motion photogrammetry: accuracy assessment of point cloud over multiple platforms
Larissa Marques Freguete ... Michael Starek
-
Larissa Marques Freguete, et. al.Larissa Marques Freguete ... Michael Starek
12 Sep 2021
12 Sep 2021

Visual SLAM: Why filter?
Hauke Strasdat ... Andrew J Davison
Image and Vision Computing | VOL. 30
Hauke Strasdat, et. al.Hauke Strasdat ... Andrew J Davison
01 Feb 2012
Image and Vision Computing | VOL. 30

Visual SLAM: Why filter?
Hauke Strasdat ... Andrew J Davison
Image and Vision Computing | VOL. -
Hauke Strasdat, et. al.Hauke Strasdat ... Andrew J Davison
01 Aug 2012
Image and Vision Computing | VOL. -

An improved visual SLAM based on affine transformation for ORB feature extraction
Lecai Cai ... Chaoyang Zhang
Optik | VOL. 227
Lecai Cai, et. al.Lecai Cai ... Chaoyang Zhang
23 Nov 2020
Optik | VOL. 227

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

GeometryFormer: Semi-Convolutional Transformer Integrated with Geometric Perception for Depth Completion in Autonomous Driving Scenes.

Abstract

Published Version

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)