An Adaptive Remote Sensing Image-Matching Network Based on Cross Attention and Deformable Convolution

Peiyan Chen,Xi Wu,Ying Fu,Jinrong Hu,Bing He,Jiliu Zhou

doi:10.3390/electronics12132889

Abstract

There are significant background changes and complex spatial correspondences between multi-modal remote sensing images, and it is difficult for existing methods to extract common features between images effectively, leading to poor matching results. In order to improve the matching effect, features with high robustness are extracted; this paper proposes a multi-temporal remote sensing matching algorithm CMRM (CNN multi-modal remote sensing matching) based on deformable convolution and cross-attention. First, based on the VGG16 backbone network, Deformable VGG16 (DeVgg) is constructed by introducing deformable convolutions to adapt to significant geometric distortions in remote sensing images of different shapes and scales; second, the features extracted from DeVgg are input to the cross-attention module to better capture the spatial correspondence of images with background changes; and finally, the key points and corresponding descriptors are extracted from the output feature map. In the feature matching stage, in order to solve the problem of poor matching quality of feature points, BFMatcher is used for rough registration, and then the RANSAC algorithm with adaptive threshold is used for constraint. The proposed algorithm in this paper performs well on the public dataset HPatches, with MMA values of 0.672, 0.710, and 0.785 when the threshold is selected as 3–5. The results show that compared to existing methods, our method improves the matching accuracy of multi-modal remote sensing images.

Full Text