Boosting cross-modal retrieval in remote sensing via a novel unified attention network

Shabnam Choudhury,Devansh Saini,Biplab Banerjee

doi:10.1016/j.neunet.2024.106718

Abstract

With the rapid advent and abundance of remote sensing data in different modalities, cross-modal retrieval tasks have gained importance in the research community. Cross-modal retrieval belongs to the research paradigm in which the query is of one modality and the retrieved output is of the other modality. In this paper, the remote sensing (RS) data modalities considered are the earth observation optical data (aerial photos) and the corresponding hand-drawn sketches. The main challenge of the cross-modal retrieval research objective for optical remote sensing images and the corresponding sketches is the distribution gap between the shared embedding space of the modalities. Prior attempts to resolve this issue have not yielded satisfactory outcomes regarding accurately retrieving cross-modal sketch-image RS data. The state-of-the-art architectures used conventional convolutional architectures, which focused on local pixel-wise information about the modalities to be retrieved. This limits the interaction between the sketch texture and the corresponding image, making these models susceptible to overfitting datasets with particular scenarios. To circumvent this limitation, we suggest establishing multi-modal correspondence using a novel architecture of the combined self and cross-attention algorithms, SPCA-Net to minimize the modality gap by employing attention mechanisms for the query and other modalities. Efficient cross-modal retrieval is achieved through the suggested attention architecture, which empirically emphasizes the global information of the relevant query modality and bridges the domain gap through a unique pairwise cross-attention network. In addition to the novel architecture, this paper introduces a unique loss function, label-specific supervised contrastive loss, tailored to the intricacies of the task and to enhance the discriminative power of the learned embeddings. Extensive evaluations are conducted on two sketch-image remote sensing datasets, Earth-on-Canvas and RSketch. Under the same experimental conditions, the performance metrics of our proposed model beat the state-of-the-art architectures by significant margins of 16.7%, 18.9%, 33.7%, and 40.9% correspondingly.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Boosting cross-modal retrieval in remote sensing via a novel unified attention network

Abstract

Talk to us

Similar Papers

More From: Neural Networks

Lead the way for us

Similar Papers

Deep Adversarial Cascaded Hashing for Cross-Modal Vessel Image Retrieval
Jiaen Guo ... Xin Guan
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 16
Jiaen Guo, et. al.Jiaen Guo ... Xin Guan
01 Jan 2023
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 16

Heterogeneous Attention Network for Effective and Efficient Cross-modal Retrieval
Tan Yu ... Hongliang Fei
-
Tan Yu, et. al.Tan Yu ... Hongliang Fei
11 Jul 2021
11 Jul 2021

A Lightweight Multi-Scale Crossmodal Text-Image Retrieval Method in Remote Sensing
Zhiqiang Yuan ... Kun Fu
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60
Zhiqiang Yuan, et. al.Zhiqiang Yuan ... Kun Fu
01 Jan 2021
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60

Multi-label modality enhanced attention based self-supervised deep cross-modal hashing
Xitao Zou ... Erwin M Bakker
Knowledge-Based Systems | VOL. 239
Xitao Zou, et. al.Xitao Zou ... Erwin M Bakker
28 Dec 2021
Knowledge-Based Systems | VOL. 239

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Boosting cross-modal retrieval in remote sensing via a novel unified attention network

Abstract

Talk to us

Similar Papers

More From: Neural Networks