Abstract

Monocular depth estimation refers to recovering the depth information of a 3D scene from a single 2D image taken by a camera. A multi-task training framework combining of semantic segmentation and depth estimation is developed to improve the monocular depth estimation performance in this paper. Nevertheless, joint annotations, namely semantic labels and depth annotations, are necessary for training dataset in the traditional joint training framework of semantics and depth. Unluckily, scarcely any large public datasets that provide the joint annotations can be accessed. To address the problem, a training framework having the feature correlation screening and linkage mechanism based on the linear independence of Gram matrix called GSFA-MDEN (Gram Semantic-Feature-Aided Monocular Depth Estimation Network), which is trained through the TSTB (Two-Stages-Two-Branches) training strategy, is studied and developed. GSFA-MDEN is composed with two brunches namely DepthNet and SemanticsNet, which are firstly trained through two different large datasets having its own respective annotation. Subsequently, the overall network is constructed through the feature fusion of the two brunches based on the Gram nonlinear correlation, which can establish the quantitative representation of the correlation between semantic features and depth features. Compared to the original DepthNet, on the KITTI dataset, GSFAMDEN decreases Root Mean Square Error (RMSE) from 5.808m to 5.370m by adding SemanticsNet assisted depth estimation, and the RMSE is further decreased to 5.167m by creatively employing Gram nonlinear correlation to excavate correlation of different task features. The series experimental results illustrate the superiority of GSFA-MDEN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.