AMAM: An Attention-based Multimodal Alignment Model for Medical Visual Question Answering

Haiwei Pan,Shuning He,Kejia Zhang,Bo Qu,Chunling Chen,Kun Shi

doi:10.1016/j.knosys.2022.109763

Abstract

Medical Visual Question Answering (VQA) is a multimodal task to answer clinical questions about medical images. Existing methods have achieved good performance, but most medical VQA models focus on visual contents while ignoring the influence of textual contents. To address this issue, this paper proposes an Attention-based Multimodal Alignment Model (AMAM) for medical VQA, aiming for an alignment of text-based and image-based attention to enrich the textual features. First, we develop an Image-to-Question (I2Q) attention and a Word-to-Question (W2Q) attention to model the relations of both visual and textual contents to the question. Second, we design a composite loss composed of a classification loss and an Image–Question Complementary (IQC) loss. The IQC loss concentrates on aligning the importance of the questions learned from visual and textual features to emphasize meaningful words in questions and improve the quality of predicted answers. Benefiting from the attention mechanisms and the composite loss, AMAM obtains rich semantic textual information and accurate answers. Finally, due to some data errors and missing labels on the VQA-RAD dataset, we further constructed an enhanced dataset, VQA-RADPh, to raise data quality. Experimental results on public datasets show better performance of AMAM compared with the advanced methods. Our source code is available at: https://github.com/shuning-ai/AMAM/tree/master.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AMAM: An Attention-based Multimodal Alignment Model for Medical Visual Question Answering

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Journal: Knowledge-Based Systems	Publication Date: Aug 27, 2022
Citations: 17

Similar Papers

Parallel multi-head attention and term-weighted question embedding for medical visual question answering.
Sruthy Manmadhan ... Binsu C Kovoor
Multimedia Tools and Applications | VOL. 82
Sruthy Manmadhan, et. al.Sruthy Manmadhan ... Binsu C Kovoor
11 Mar 2023
Multimedia Tools and Applications | VOL. 82

Feature Enhancement in Attention for Visual Question Answering
Yuetan Lin ... Zhangyang Pang
-
Yuetan Lin, et. al.Yuetan Lin ... Zhangyang Pang
01 Jul 2018
01 Jul 2018

Estimation of Visual Contents from Human Brain Signals via VQA Based on Brain-Specific Attention
Ryo Shichida ... Takahiro Ogawa
-
Ryo Shichida, et. al.Ryo Shichida ... Takahiro Ogawa
04 Jun 2023
04 Jun 2023

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding.
Qingxing Cao ... Liang Lin
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33
Qingxing Cao, et. al.Qingxing Cao ... Liang Lin
01 Jan 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AMAM: An Attention-based Multimodal Alignment Model for Medical Visual Question Answering

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems