Stereo matching plays an important role in advancing substantial applications such as 3D reconstruction, autonomous driving, depth sensing, and has recently made tremendous progress. It still poses a significant challenge, however, in the extraction of disparities from a pair of rectified images to achieve accurate stereo matching. In this paper, we propose an attention-fused cost volume construction and progressive cost aggregation network, namely AP-Net, to elevate the accuracy of stereo matching. Specifically, we introduce attention fusion to learn feature similarity between corresponding points on the left and right images while capturing long-dependent contextual information to construct attention-fused cost volumes. Moreover, we propose progressive cost aggregation, which consists of two stages to fully exploit information from different cost volumes. Furthermore, to make more accurate adjustments to the initial disparity, we implement an information entropy-guided disparity refinement module, which leverages information entropy as an explicit confidence measure of the initial disparity estimation to guide the refinement module in calculating the disparity residual more accurately and improving the overall accuracy of disparity estimation. Our implemented disparity refinement module can also be seamlessly embedded into a variety of stereo matching networks, significantly improving the model’s performance with only a small increase in computation and parameters. Experimental results across various stereo datasets confirm that our proposed AP-Net consistently delivers competitive performance. The code is available at https://github.com/zhuys-bupt/AP-Net.
Read full abstract