Improving visual question answering for remote sensing via alternate-guided attention and combined loss

Jiangfan Feng,Etao Tang,Maimai Zeng,Zhujun Gu,Pinglang Kou,Wei Zheng

doi:10.1016/j.jag.2023.103427

Abstract

Visual question answering (VQA) for remote sensing (RS) images offers a typical multi-modal task advanced by natural language processing and computer vision technologies. Still, it remains a problem subjected to two aspects of factors. First, the RS image contains a wealth of visual elements but is rarely involved in the question; therefore, the regions humans choose to look at to answer questions differ from current attention approaches. Second, the class-imbalance problem in the existing RSVQA dataset leads the prediction to drift towards frequent answers. To address these issues, we aim to explore the intrinsic relationship between RS visual elements and text generations with imbalance compensation. Here, we propose a new method called Union Context-wise and Alternate-Guided Attention Network (UCAGAN). Our method uses a cross-modal alternative-guided attention module to map visual and textual features. Moreover, we introduce an improved multi-category loss function to compensate for the model bias caused by sample imbalance. We carried out thorough experimentation on a diverse range of datasets, demonstrating that our approach is effective and efficient, achieving state-of-the-art performance. Our work provides results that are not only correctable but also explainable, ultimately leading to the development of reliable VQA models for RS images.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving visual question answering for remote sensing via alternate-guided attention and combined loss

Abstract

Talk to us

Similar Papers

More From: International Journal of Applied Earth Observation and Geoinformation

Lead the way for us

Journal: International Journal of Applied Earth Observation and Geoinformation	Publication Date: Jul 28, 2023
Citations: 1

Similar Papers

Deep Learning and Natural Language Processing Technology Based Display and Analysis of Modern Artwork
Xiongfei Li, Yongjun Li
Journal of Electrical Systems | VOL. 20
Xiongfei Li, Yongjun LiXiongfei Li, Yongjun Li
04 Apr 2024
Journal of Electrical Systems | VOL. 20

VQAR: Review on Information Retrieval Techniques based on Computer Vision and Natural Language Processing
Shivangi Modi ... Dhatri Pandya
-
Shivangi Modi, et. al.Shivangi Modi ... Dhatri Pandya
01 Mar 2019
01 Mar 2019

Research on Natural Language Processing and Aspose Technology in the Automatic Generation of Ocean Weather Public Report
Xinping Bai ... Hui Wang
-
Xinping Bai, et. al.Xinping Bai ... Hui Wang
01 Jan 2019
01 Jan 2019

Natural Language Processing for the Semantic Web
Diana Maynard ... Isabelle Augenstein
-
Diana Maynard, et. al.Diana Maynard ... Isabelle Augenstein
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving visual question answering for remote sensing via alternate-guided attention and combined loss

Abstract

Talk to us

Similar Papers

More From: International Journal of Applied Earth Observation and Geoinformation