Dense monocular depth estimation for stereoscopic vision based on pyramid transformer and multi-scale feature fusion

Zhongyi Xia,Tianzhao Wu,Zhuoyan Wang,Man Zhou,Boqi Wu,C Y Chan,Ling Bing Kong

doi:10.1038/s41598-024-57908-z

Abstract

Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers to depth estimation, the Stereoscopic Pyramid Transformer-Depth (SPT-Depth) is introduced. This method utilizes stepwise downsampling to acquire both shallow and deep semantic information, which are subsequently fused. The training process is divided into fine and coarse convergence stages, employing distinct training strategies and hyperparameters, resulting in a substantial reduction in both training and validation losses. In the training strategy, a shift and scale-invariant mean square error function is employed to compensate for the lack of translational invariance in the Transformers. Additionally, an edge-smoothing function is applied to reduce noise in the depth map, enhancing the model's robustness. The SPT-Depth achieves a global receptive field while effectively reducing time complexity. In comparison with the baseline method, with the New York University Depth V2 (NYU Depth V2) dataset, there is a 10% reduction in Absolute Relative Error (Abs Rel) and a 36% decrease in Root Mean Square Error (RMSE). When compared with the state-of-the-art methods, there is a 17% reduction in RMSE.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Mar 25, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Dense monocular depth estimation for stereoscopic vision based on pyramid transformer and multi-scale feature fusion

Abstract

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Monocular Depth Estimation Using a Laplacian Image Pyramid with Local Planar Guidance Layers.
Youn-Ho Choi ... Seok-Cheol Kee
Sensors (Basel, Switzerland) | VOL. 23
Youn-Ho Choi, et. al.Youn-Ho Choi ... Seok-Cheol Kee
11 Jan 2023
Sensors (Basel, Switzerland) | VOL. 23

Assimilation of soil moisture data from cosmic-ray neutron sensors into the integrated Terrestrial System Modeling Platform TSMP (case study: Rur catchment in Germany)
Fang Li ... Wolfgang Kurtz
-
Fang Li, et. al.Fang Li ... Wolfgang Kurtz
08 May 2023
08 May 2023

Large topsoil organic carbon variability is controlled by Andisol properties and effectively assessed by VNIR spectroscopy in a coffee agroforestry system of Costa Rica
Rintaro Kinoshita ... Harold M Van Es
Geoderma | VOL. 262
Rintaro Kinoshita, et. al.Rintaro Kinoshita ... Harold M Van Es
05 Sep 2015
Geoderma | VOL. 262

Can a Sparse Network of Cosmic Ray Neutron Sensors Improve Soil Moisture and Evapotranspiration Estimation at the Larger Catchment Scale?
Fang Li ... Harrie‐Jan Hendricks Franssen
Water Resources Research | VOL. 60
Fang Li, et. al.Fang Li ... Harrie‐Jan Hendricks Franssen
01 Jan 2024
Water Resources Research | VOL. 60

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dense monocular depth estimation for stereoscopic vision based on pyramid transformer and multi-scale feature fusion

Abstract

Talk to us

Similar Papers

More From: Scientific Reports