Deep Confidence Propagation Stereo Network

Kai Zeng,Yaonan Wang,Hui Zhang,Wei Wang,Qing Zhu,Jianxu Mao

doi:10.1109/tits.2023.3264705

Abstract

Stereo matching depth estimation based on rectified image pairs is of great importance to many computer vision tasks such as vehicle navigation and autonomous driving. Confidence measures are typically used to refine stereo matching results, which provides robustness and efficiency for disparity estimation. However, previous learning-based confidence methods for stereo matching usually use the middle results or composition as a post-processing step to refine the stereo matching results. This cannot be optimized end-to-end and the performance is limited by the quality of the tri-modal output. To handle this issue, in this paper, we pursue an end-to-end hierarchical architecture and propose a differentiable confidence propagation (DCP) model of a cost aggregation network for stereo matching. The DCP model is integrated into an end-to-end neural network hierarchical architecture to guide matching cost volume aggregation. More specifically, to better represent the similarity of left and right feature maps, we extract unary context feature maps with an effective attention mechanism for matching cost construction. Moreover, we aggregate the cost volume with the multiple stacked DCP cost aggregation (DCPCA) networks to generate a more reliable and finer cost volume. This network suppresses multi-level disparity maps. Each output disparity is supervised with different training weights to learn in a coarse-to-fine way. Our method outperforms previous methods on the Sceneflow dataset by achieving the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$0.6735px$</tex-math> </inline-formula> EPE error, achieving 1.53% D1-all metric of Non-occluded pixels regions and 0.72% Non-occluded pixels of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$5px$</tex-math> </inline-formula> metric on KITTI 2015 and 2012 dataset. Extensive experiments carried out on the KITTI Stereo benchmarks demonstrate that our DCPCA-Net can significantly minimize the trade-off between accuracy and efficiency for stereo matching.

Full Text