AbstractIn response to the low matching accuracy of stereo matching algorithms in image regions with specular reflection, this paper proposes a multidimensional fusion stereo matching algorithm named MFANet. The algorithm embeds a multispectral attention module into the residual feature extraction network, utilizing two‐dimensional discrete cosine transforms to extract frequency features. In the pyramid pooling module, a coordinated attention mechanism is introduced to capture relevant positional information. In the cost aggregation part, the MFANet algorithm incorporates a three‐dimensional attention mechanism, focusing on the more important semantic information in high‐level features. By combining detailed information from low‐level features, semantic information from high‐level features, and contextual information, the algorithm generates features that are more conducive to disparity prediction. The MFANet algorithm is evaluated on three standard datasets (SceneFlow, KITTI2015, and KITTI2012). Experimental results demonstrate its robustness against specular reflection interference, accurate prediction of disparities in specular reflection pathological regions, and promising application prospects.
Read full abstract