Abstract

With the increasing of cross-modal data, cross-modal retrieval has attracted more attention in remote sensing (RS), since it provides a more flexible and convenient way to obtain interesting information than traditional retrieval. However, existing methods cannot fully exploit the semantic information, which only focuses on the semantic consistency, and ignore the information complementarity between different modalities. In this letter, to bridge the modality gap, we propose a novel fusion-based correlation learning model (FCLM) for image-text retrieval in RS. Specifically, a cross-modal-fusion network is designed to capture the intermodality complementary information and fused feature. The fused knowledge is furtherly transferred to supervise the learning of modality-specific network by knowledge distillation, which is helpful in improving the discriminative ability of feature representation and enhancing the intermodality semantic consistency to solve the heterogeneity gap problem. Finally, extensive experiments have been conducted on a public dataset and experimental results have shown that the FCLM method is effective in performing cross-modal retrieval and outperforms several baseline methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.