Vision–Language Model for Visual Question Answering in Medical Imagery

Yakoub Bazi,Mansour Zuair,Mohamad Mahmoud Al Rahhal,Laila Bashmal

doi:10.3390/bioengineering10030380

Yakoub Bazi, Mansour Zuair + Show 2 more

Open Access

https://doi.org/10.3390/bioengineering10030380

Copy DOI

Journal: Bioengineering	Publication Date: Mar 20, 2023
Citations: 10	License type: CC BY 4.0

Affiliation: King Saud University

Abstract

In the clinical and healthcare domains, medical images play a critical role. A mature medical visual question answering system (VQA) can improve diagnosis by answering clinical questions presented with a medical image. Despite its enormous potential in the healthcare industry and services, this technology is still in its infancy and is far from practical use. This paper introduces an approach based on a transformer encoder-decoder architecture. Specifically, we extract image features using the vision transformer (ViT) model, and we embed the question using a textual encoder transformer. Then, we concatenate the resulting visual and textual representations and feed them into a multi-modal decoder for generating the answer in an autoregressive way. In the experiments, we validate the proposed model on two VQA datasets for radiology images termed VQA-RAD and PathVQA. The model shows promising results compared to existing solutions. It yields closed and open accuracies of 84.99% and 72.97%, respectively, for VQA-RAD, and 83.86% and 62.37%, respectively, for PathVQA. Other metrics such as the BLUE score showing the alignment between the predicted and true answer sentences are also reported.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Vision–Language Model for Visual Question Answering in Medical Imagery

Abstract

Talk to us

Similar Papers

More From: Bioengineering

Lead the way for us

Similar Papers

Parallel multi-head attention and term-weighted question embedding for medical visual question answering.
Sruthy Manmadhan ... Binsu C Kovoor
Multimedia Tools and Applications | VOL. 82
Sruthy Manmadhan, et. al.Sruthy Manmadhan ... Binsu C Kovoor
11 Mar 2023
Multimedia Tools and Applications | VOL. 82

Can Pre-training help VQA with Lexical Variations?
Shailza Jolly ... Shubham Kapoor
-
Shailza Jolly, et. al.Shailza Jolly ... Shubham Kapoor
01 Jan 2020
01 Jan 2020

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding.
Qingxing Cao ... Liang Lin
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33
Qingxing Cao, et. al.Qingxing Cao ... Liang Lin
01 Jan 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33

Adversarial Sample Synthesis for Visual Question Answering
Chuanhao Li ... Yuwei Wu
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -
Chuanhao Li, et. al.Chuanhao Li ... Yuwei Wu
16 Sep 2024
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Vision–Language Model for Visual Question Answering in Medical Imagery

Abstract

Talk to us

Similar Papers

More From: Bioengineering