Multi-stage information diffusion for joint depth and surface normal estimation

Zhiheng Fu,Siyu Hong,Mengyi Liu,Hamid Laga,Mohammed Bennamoun,Farid Boussaid,Yulan Guo

doi:10.1016/j.patcog.2023.109660

Abstract

Depth and surface normal estimations are important for 3D geometric perception, which has numerous applications including autonomous vehicles and robots. In this paper, we propose a lightweight Multi-stage Information Diffusion Network (MIDNet) for the simultaneous prediction of depth and surface normals from a single RGB image. To obtain semantic and detail-preserving features, we adopt a high-resolution network as our backbone to learn multi-scale features, which are then fused into shared features for the two tasks. To mutually boost each task, a Cross-Correlation Attention Module (CCAM) is proposed to adaptively integrate information for the prediction of the two tasks in multiple stages, including feature-level information interaction and task-level information interaction. Ablation studies show that the proposed multi-stage information diffusion strategy can boost the performance gain for the two tasks at different levels. Compared to current state-of-the-art methods on the NYU Depth V2, Stanford 2D-3D-Semantic and KITTI datasets, our method achieves superior performance for both monocular depth and surface normal estimation.

Full Text