Feature back-projection guided residual refinement for real-time stereo matching network

Bin Wen,Han Zhu,Chao Yang,Zhicong Li,Renxuan Cao

doi:10.1016/j.image.2022.116636

Abstract

In recent stereo matching research, deep convolutional neural networks (CNNs) have shown excellent performance to estimate depth from stereo image pairs. Previous works mainly focus on improving the robust performance of the stereo matching network to obtain higher matching accuracy. In this paper, we propose an end-to-end real-time stereo matching network (FBPGNet). FBPGNet manifests its characteristics in three parts: feature extraction module (FEM), initial disparity estimation module (IDEM), feature back-projection guided residual refinement module (FBPG) The FEM is designed to capture semantic and contextual information, which is composed of residual block, dilation convolution and spatial attention mechanism. The IDEM is proposed to produce an initial low-resolution (LR) disparity map, which utilizes an hourglass 3D convolution architecture. In addition, the FBPG is employed to refine the up-sampled low-resolution disparity map, which takes the features from the FEM and the low-resolution disparity map as guide information. Experiments show that the proposed stereo matching network has comparable prediction accuracy and inference speed compared with recent real-time stereo matching networks, and can achieve 25 fps on high-end GPU.

Full Text