Abstract

Humans are able to estimate the depth of objects in their environment even using only one eye through the use of what are known as monocular cues. In this paper, we aim to integrate human knowledge and human-like reasoning used for monocular depth estimation within deep neural networks. The idea is to support the network in order to help it learn in an explicit and fast way the essential cues for the target task. For this purpose, we investigate the possibility of directly integrating geometric, semantic, and contextual information into the monocular depth estimation process. We propose exploiting an ontology model in a deep learning context to represent the urban environment as a structured set of concepts linked with semantic relationships. Monocular cues information are extracted through reasoning performed on the proposed ontology and are fed together with the RGB image in a multistream way into the deep neural network for depth estimation. Our approach is validated and evaluated on widespread benchmark datasets: KITTI, CityScapes, and AppolloScape. The obtained results show that the proposed method improves upon the state-of-the-art monocular depth estimation deep models and shows promising results regarding cross-evaluation, mainly for unseen driving scenarios.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.