MC-Net: multi-scale contextual information aggregation network for image captioning on remote sensing images

Haiyan Huang,Zhenfeng Shao,Qimin Cheng,Xiao Huang,Xiaoping Wu,Guoming Li,Li Tan

doi:10.1080/17538947.2023.2283482

Abstract

ABSTRACT Remote Sensing Image Captioning (RSIC) plays a crucial role in advancing semantic understanding and has increasingly become a focal point of research. Nevertheless, existing RSIC methods grapple with challenges due to the intricate multi-scale nature and multifaceted backgrounds inherent in Remote Sensing Images (RSIs). Compounding these challenges are the perceptible information disparities across diverse modalities. In response to these challenges, we propose a novel multi-scale contextual information aggregation image captioning network (MC-Net). This network incorporates an image encoder enhanced with a multi-scale feature extraction module, a feature fusion module, and a finely tuned adaptive decoder equipped with a visual-text alignment module. Notably, MC-Net possesses the capability to extract informative multiscale features, facilitated by the multilayer perceptron and transformer. We also introduce an adaptive gating mechanism during the decoding phase to ensure precise alignment between visual regions and their corresponding text descriptions. Empirical studies conducted on four publicly recognized cross-modal datasets unequivocally demonstrate the superior robustness and efficacy of MC-Net in comparison to contemporaneous RSIC methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Digital Earth	Publication Date: Nov 28, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

MC-Net: multi-scale contextual information aggregation network for image captioning on remote sensing images

Abstract

Talk to us

Similar Papers

More From: International Journal of Digital Earth

Lead the way for us

Similar Papers

Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning
Yunpeng Li ... Xin Wang
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60
Yunpeng Li, et. al.Yunpeng Li ... Xin Wang
01 Jan 2021
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60

Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning
Zhenghang Yuan ... Qi Wang
IEEE Access | VOL. 8
Zhenghang Yuan, et. al.Zhenghang Yuan ... Qi Wang
01 Jan 2020
IEEE Access | VOL. 8

Remote sensing image caption generation via transformer and reinforcement learning
Xiangqing Shen ... Jiaqi Zhao
Multimedia Tools and Applications | VOL. 79
Xiangqing Shen, et. al.Xiangqing Shen ... Jiaqi Zhao
17 Jul 2020
Multimedia Tools and Applications | VOL. 79

Remote sensing image captioning via Variational Autoencoder and Reinforcement Learning
Xiangqing Shen ... Mingming Liu
Knowledge-Based Systems | VOL. 203
Xiangqing Shen, et. al.Xiangqing Shen ... Mingming Liu
23 Apr 2020
Knowledge-Based Systems | VOL. 203

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MC-Net: multi-scale contextual information aggregation network for image captioning on remote sensing images

Abstract

Talk to us

Similar Papers

More From: International Journal of Digital Earth