Interacting-Enhancing Feature Transformer for Cross-Modal Remote-Sensing Image and Text Retrieval

Xu Tang,Fang Liu,Xiangrong Zhang,Licheng Jiao,Yijing Wang,Jingjing Ma

doi:10.1109/tgrs.2023.3280546

Abstract

Cross-modal remote sensing image-text retrieval (CMRSITR) is a challenging topic in the remote sensing (RS) community. It has gained growing attention because it can be flexibly used in many practical applications. In the current deep era, with the help of deep convolutional neural networks (DCNNs), many successful CMRSITR methods have been proposed. Most of them first learn valuable features from RS images and texts respectively. Then, the obtained visual and textual features are mapped into a common space for the final retrieval. The above operations are feasible, however, two difficulties are still to be solved. One is that the semantics within the visual and textual features are misaligned due to the independent learning manner. The other one is that the deep links between RS images and texts cannot be fully explored by simple common space mapping. To overcome the above challenges, we propose a new model named interacting-enhancing feature transformer (IEFT) for CMRSITR, which regards the RS images and texts as a whole. First, a simple feature embedding module (FEM) is developed to map images and texts into the visual and textual feature spaces. Second, an information interacting-enhancing module (IIEM) is designed to simultaneously model the inner relationships between RS images and texts and enhance the visual features. IIEM consists of three feature interacting-enhancing (FIE) blocks, each of which contains an inter-modality relationship interacting (IMRI) sub-block and a visual feature enhancing (VFE) sub-block. The duty of IMRI is to exploit the hidden relations between cross-modal data, while the responsibility of VFE is to improve the visual features. By combining them, semantic bias can be mitigated, and the complex contents of RS images can be studied. Finally, the retrieval module (RM) is constructed to generate the matching scores for deciding the search results. Extensive experiments are conducted on four public RS data sets. The positive results demonstrate that our IEFT can achieve superior retrieval performance compared with many existing methods. Our source codes are available at https://github.com/TangXu-Group/Cross-modal-remote-sensing-image-and-text-retrieval-models/tree/main/IEFT.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Interacting-Enhancing Feature Transformer for Cross-Modal Remote-Sensing Image and Text Retrieval

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society

Lead the way for us

Journal: IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society	Publication Date: Jan 1, 2023
Citations: 9

Similar Papers

End-to-end multimodal clinical depression recognition using deep neural networks: A comparative analysis
Muhammad Muzammel ... Alice Othmani
Computer methods and programs in biomedicine | VOL. 211
Muhammad Muzammel, et. al.Muhammad Muzammel ... Alice Othmani
28 Sep 2021
Computer methods and programs in biomedicine | VOL. 211

Image Privacy Prediction Using Deep Neural Networks
Ashwini Tonge ... Cornelia Caragea
ACM Transactions on the Web | VOL. 14
Ashwini Tonge, et. al.Ashwini Tonge ... Cornelia Caragea
09 Apr 2020
ACM Transactions on the Web | VOL. 14

Identifying disaster related social media for rapid response: a visual-textual fused CNN architecture
Xiao Huang ... Huan Ning
International Journal of Digital Earth | VOL. 13
Xiao Huang, et. al.Xiao Huang ... Huan Ning
23 Jun 2019
International Journal of Digital Earth | VOL. 13

Region-attentive multimodal neural machine translation
Yuting Zhao ... Chenhui Chu
Neurocomputing | VOL. 476
Yuting Zhao, et. al.Yuting Zhao ... Chenhui Chu
03 Jan 2022
Neurocomputing | VOL. 476

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Interacting-Enhancing Feature Transformer for Cross-Modal Remote-Sensing Image and Text Retrieval

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society