Multi-task feature fusion network for monocular depth estimation without joint annotations

Jialing Zou,James Zhiqing Wen,Rui Wang

doi:10.1117/12.2627286

Abstract

Monocular depth estimation refers to recovering the depth information of a 3D scene from a single 2D image taken by a camera. A multi-task training framework combining of semantic segmentation and depth estimation is developed to improve the monocular depth estimation performance in this paper. Nevertheless, joint annotations, namely semantic labels and depth annotations, are necessary for training dataset in the traditional joint training framework of semantics and depth. Unluckily, scarcely any large public datasets that provide the joint annotations can be accessed. To address the problem, a training framework having the feature correlation screening and linkage mechanism based on the linear independence of Gram matrix called GSFA-MDEN (Gram Semantic-Feature-Aided Monocular Depth Estimation Network), which is trained through the TSTB (Two-Stages-Two-Branches) training strategy, is studied and developed. GSFA-MDEN is composed with two brunches namely DepthNet and SemanticsNet, which are firstly trained through two different large datasets having its own respective annotation. Subsequently, the overall network is constructed through the feature fusion of the two brunches based on the Gram nonlinear correlation, which can establish the quantitative representation of the correlation between semantic features and depth features. Compared to the original DepthNet, on the KITTI dataset, GSFAMDEN decreases Root Mean Square Error (RMSE) from 5.808m to 5.370m by adding SemanticsNet assisted depth estimation, and the RMSE is further decreased to 5.167m by creatively employing Gram nonlinear correlation to excavate correlation of different task features. The series experimental results illustrate the superiority of GSFA-MDEN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-task feature fusion network for monocular depth estimation without joint annotations

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

SABV-Depth: A biologically inspired deep learning network for monocular depth estimation
Junfan Wang ... Qiheng Miao
Knowledge-Based Systems | VOL. 263
Junfan Wang, et. al.Junfan Wang ... Qiheng Miao
14 Jan 2023
Knowledge-Based Systems | VOL. 263

Self-supervised learning of monocular depth using quantized networks
Keyu Lu ... Yonghu Zeng
Neurocomputing | VOL. 488
Keyu Lu, et. al.Keyu Lu ... Yonghu Zeng
06 Dec 2021
Neurocomputing | VOL. 488

Monocular Depth Estimation Using a Laplacian Image Pyramid with Local Planar Guidance Layers.
Youn-Ho Choi ... Seok-Cheol Kee
Sensors (Basel, Switzerland) | VOL. 23
Youn-Ho Choi, et. al.Youn-Ho Choi ... Seok-Cheol Kee
11 Jan 2023
Sensors (Basel, Switzerland) | VOL. 23

SFA-MDEN: Semantic-Feature-Aided Monocular Depth Estimation Network Using Dual Branches.
Rui Wang ... Jialing Zou
Sensors | VOL. 21
Rui Wang, et. al.Rui Wang ... Jialing Zou
13 Aug 2021
Sensors | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-task feature fusion network for monocular depth estimation without joint annotations

Abstract

Talk to us

Similar Papers