Abstract

The cross-modal remote sensing image-text retrieval (CMRSITR) is a lively research topic in the remote sensing (RS) community. Benefiting from the large pretrained image-text models, many successful CMRSITR methods have been proposed in recent years. Although their performance is attractive, there are still some challenges. First, fine-tuning large pretrained models requires a significant amount of computational resources. Second, most large models are pretrained by natural images, which reduces their effectiveness in processing RS images. To tackle these challenges, we propose a new CMRSITR network named context and uncertainty-aware prompt (CUP). First, prompt tuning theory is introduced into CUP to eliminate the burden of optimization resources. By training the prompt tokens rather than all parameters, the large model's knowledge can be transferred to CMRSITR tasks with small trainable parameters. Second, considering the differences between natural-image-based prior clues and RS images, apart from adopting the free-prompt tokens, we develop a prompt generation module (PGM) to produce the RS-oriented prompt tokens. The specific prompt tokens are rich in object-level messages of RS images, which help CUP narrow the gaps between natural large models and RS images. Third, we further design an uncertainty estimation module (UEM) to whittle down the uncertainties caused by the model and data. This way, can not only the semantic misalignment and intraclass diversity imbalance problems be mitigated but also the RS clues can be deeply explored. Competitive experimental results counted on three public benchmark datasets demonstrate that our CUP can achieve competitive performance in the CMRSITR task compared with many existing methods. Our source codes are available at: https://github.com/TangXu-Group/Cross-modal-remote-sensing-image-and-text-retrieval-models/tree/main/CUP.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.