Currently, deep learning-based stereo matching is solely based on local convolution networks, which lack enough global information for accurate disparity estimation. Motivated by the excellent global representation of the graph, a novel Multi-scale Graph Neural Network (MGNN) is proposed to essentially improve stereo matching from the global aspect. Firstly, we construct the multi-scale graph structure, where the multi-scale nodes with projected multi-scale image features can be directly linked by the inner-scale and cross-scale edges, instead of solely relying on local convolutions for deep learning-based stereo matching. To enhance the spatial position information at non-Euclidean multi-scale graph space, we further propose a multi-scale position embedding to embed the potential position features of Euclidean space into projected multi-scale image features. Secondly, we propose the multi-scale graph feature inference to extract global context information on multi-scale graph structure. Thus, the features not only be globally inferred on each scale, but also can be interactively inferred across different scales to comprehensively consider global context information with multi-scale receptive fields. Finally, MGNN is deployed into dense stereo matching and experiments demonstrate that our method achieves state-of-the-art performance on Scene Flow, KITTI 2012/2015, and Middlebury Stereo Evaluation v.3/2021.
Read full abstract