Abstract

Geometric and semantic contexts are essential to solving the ill-posed problem of monocular depth estimation (MDE). In this paper, we propose a deep MDE framework that can aggregate dual-modal structural contexts for monocular depth estimation (DSC-MDE). First, a cross-shaped context (CSC) aggregation module is developed to globally encode the geometric structures in depth maps observed under the fields of vision of robots/autonomous vehicles. Next, the CSC-encoded geometric features are further modulated with semantic context in an object-regional context (ORC) aggregation module. Finally, to train the proposed network, we present a focal ordinal loss (FOL), which pays more attention to distant samples to avoid the issue of over-relaxed constraints on these samples occurring in the ordinal regression loss (ORL). We compare the proposed model to recent methods with geometric and multi-modal contexts, and show that the proposed model obtains state-of-the-art performance on both indoor and outdoor datasets, including NYU-Depth-V2, Cityscapes and KITTI.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.