Abstract

Effective feature description for cross-modal remote sensing matching is challenging due to the complex geometric and radiometric differences between multimodal images. Currently, Siamese or pseudo-Siamese networks directly describe features from multimodal remote sensing images at the fully connected layer, however, the similarity of cross-modal features during feature extraction is barely considered. Therefore, we construct a cross-modal feature description matching network (CM-Net) for remote sensing image matching in this paper. First, a contextual self-attention module is proposed to add semantic global dependency information using candidate and non-candidate keypoint patches. Then, a cross-fusion module is designed to obtain cross-modal feature descriptions through information interaction. Finally, a similarity matching loss function is presented to optimize discriminative feature representations, converting a matching task into a classification task. The proposed CM-Net model is evaluated by qualitative and quantitative experiments on four multimodal image datasets, which achieves the average Matching score (M.S.), Mean Matching Accuracy (MMA), and average Root-mean-square error (aRMSE) of 0.781, 0.275, and 1.726, respectively. The comparative study demonstrates the superior performance of the proposed CM-Net for the remote sensing image matching.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.