Abstract

Monocular depth estimation (MDE) is a challenging yet crucial computer vision task, which aims to generate accurate depth maps from a single image. Existing MDE approaches mainly rely on extracting and fusing diverse information from multi-level features to improve prediction accuracy. However, these methods often apply a traditional feature pyramid structure, neglecting a comprehensive exploration of feature fusion paths across multiple levels. Moreover, a single-feature fusion strategy has limited ability to optimize the network. We propose a novel related cross-level feature network (RCFNet) with cascaded self-distillation for monocular depth estimation, including a cross-level feature enhancement (CLFE) module, a hierarchical feature cross refinement (HFCR) module, and a cascaded self-distillation (CSD) module. The CLFE module integrates cross-level features to further exploit the highest-level features, where a channel attention mechanism with hybrid weight operations is deployed to enhance the initial features. The HFCR module adaptively captures strongly correlated complementary information through a window-based multi-head cross-attention mechanism to generate refined features. Meanwhile, the CSD module with a hierarchical feature transformation loss is proposed, which can be viewed as a virtual teacher to progressively extract discriminative features within the network for better gradient flow improvement. Extensive experiments on NYUv2 and KITTI datasets demonstrate that our method outperforms existing SOTA MDE methods in terms of accuracy capacity and robustness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.