A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing

Qimin Cheng,Peng Fu,Yuzhuo Zhou,Yuan Xu,Liang Zhang

doi:10.1109/jstars.2021.3070872

Abstract

Because of the rapid growth of multimodal data from the internet and social media, a cross-modal retrieval has become an important and valuable task in recent years.The purpose of the cross-modal retrieval is to obtain the result data in one modality (e.g., image), which is semantically similar to the query data in another modality (e.g., text).In the field of remote sensing, despite a great number of existing works on image retrieval, there has only been a small amount of research on the cross-modal image-text retrieval, due to the scarcity of datasets and the complicated characteristics of remote sensing image data. In this article, we introduce a novel cross-modal image-text retrieval network to establish the direct relationship between remote sensing images and their paired text data. Specifically, in our framework, we designed a semantic alignment module to fully explore the latent correspondence between images and text, in which we used the attention and gate mechanisms to filter and optimize data features so that more discriminative feature representations can be obtained. Experimental results on four benchmark remote sensing datasets, including UCMerced-LandUse-Captions, Sydney-Captions, RSICD, and NWPU-RESISC45-Captions, well showed that our proposed method outperformed other baselines and achieved the state-of-the-art performance in remote sensing image-text retrieval tasks.

Highlights

W ITH the rapid development of Earth observation technology, the quantity and quality of remote sensing data have increased rapidly
We show the whole structure of our proposed deep image–text semantic alignment network in Fig. 2, which mainly includes the following three parts: 1) extraction of remote sensing image features; 2) extraction of text features; and 3) an semantic alignment module (SAM)
In order to prove the effectiveness of our proposed method, we evaluate our SAM on four public datasets: UCMerced-LandUseCaptions, Sydney-Captions, RSICD, and NWPU-RESISC45Captions

Summary

Introduction

W ITH the rapid development of Earth observation technology, the quantity and quality of remote sensing data have increased rapidly. Researches on the remote sensing image retrieval task [1]–[6]. Instead of retrieving in unimodal data, people are more inclined to search for the required information in multimodal data with richer semantics. Cross-modal retrieval technology can mine effective information and has broad application prospects in many fields, such as early warning of disasters and resource management. Satisfactory accuracy has been observed in the cross-modal retrieval of natural images [7]–[9]. It is difficult to implement an effective and efficient cross-modal retrieval of remote sensing images since these images have complicated characteristics such as multiscale, small targets, high resolution, and lack of annotated information

Objectives

Methods

Results

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE journal of selected topics in applied earth observations and remote sensing	Publication Date: Jan 1, 2021
Citations: 109	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE journal of selected topics in applied earth observations and remote sensing

Lead the way for us

Similar Papers

Deep-Learning-based Cross-Modal Luxury Microblogs Retrieval
Menghao Ma ... Wuying Liu
-
Menghao Ma, et. al.Menghao Ma ... Wuying Liu
11 Dec 2021
11 Dec 2021

Unsupervised Contrastive Hashing for Cross-Modal Retrieval in Remote Sensing
Georgii Mikriukov ... Mahdyar Ravanbakhsh
-
Georgii Mikriukov, et. al.Georgii Mikriukov ... Mahdyar Ravanbakhsh
23 May 2022
23 May 2022

Asymmetric Deep Cross-modal Hashing
Jingzi Gu ... Jinchao Zhang
-
Jingzi Gu, et. al.Jingzi Gu ... Jinchao Zhang
01 Jan 2019
01 Jan 2019

Deep Adversarial Cascaded Hashing for Cross-Modal Vessel Image Retrieval
Jiaen Guo ... Xin Guan
IEEE journal of selected topics in applied earth observations and remote sensing | VOL. 16
Jiaen Guo, et. al.Jiaen Guo ... Xin Guan
01 Jan 2023
IEEE journal of selected topics in applied earth observations and remote sensing | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE journal of selected topics in applied earth observations and remote sensing