Cross-Modal feature description for remote sensing image matching

Liangzhi Li,Ming Liu,Ling Han,Lingfei Ma

doi:10.1016/j.jag.2022.102964

Abstract

Effective feature description for cross-modal remote sensing matching is challenging due to the complex geometric and radiometric differences between multimodal images. Currently, Siamese or pseudo-Siamese networks directly describe features from multimodal remote sensing images at the fully connected layer, however, the similarity of cross-modal features during feature extraction is barely considered. Therefore, we construct a cross-modal feature description matching network (CM-Net) for remote sensing image matching in this paper. First, a contextual self-attention module is proposed to add semantic global dependency information using candidate and non-candidate keypoint patches. Then, a cross-fusion module is designed to obtain cross-modal feature descriptions through information interaction. Finally, a similarity matching loss function is presented to optimize discriminative feature representations, converting a matching task into a classification task. The proposed CM-Net model is evaluated by qualitative and quantitative experiments on four multimodal image datasets, which achieves the average Matching score (M.S.), Mean Matching Accuracy (MMA), and average Root-mean-square error (aRMSE) of 0.781, 0.275, and 1.726, respectively. The comparative study demonstrates the superior performance of the proposed CM-Net for the remote sensing image matching.

Full Text