Visual Question Answering Models Research Articles

Overview

121 Articles

Published in last 50 years

Articles published on Visual Question Answering Models

121 Search results

Vision–Language Model for Visual Question Answering in Medical Imagery

In the clinical and healthcare domains, medical images play a critical role. A mature medical visual question answering system (VQA) can improve diagnosis by answering clinical questions presented with a medical image. Despite its enormous potential in the healthcare industry and services, this technology is still in its infancy and is far from practical use. This paper introduces an approach based on a transformer encoder-decoder architecture. Specifically, we extract image features using the vision transformer (ViT) model, and we embed the question using a textual encoder transformer. Then, we concatenate the resulting visual and textual representations and feed them into a multi-modal decoder for generating the answer in an autoregressive way. In the experiments, we validate the proposed model on two VQA datasets for radiology images termed VQA-RAD and PathVQA. The model shows promising results compared to existing solutions. It yields closed and open accuracies of 84.99% and 72.97%, respectively, for VQA-RAD, and 83.86% and 62.37%, respectively, for PathVQA. Other metrics such as the BLUE score showing the alignment between the predicted and true answer sentences are also reported.

Open Access

Bioengineering

Mar 20, 2023
Yakoub Bazi + 3

Editage

Paperpal

R Discovery

Mind the Graph

Visual Question Answering Models Research Articles

Related Topics

Articles published on Visual Question Answering Models

Vision–Language Model for Visual Question Answering in Medical Imagery

Parallel multi-head attention and term-weighted question embedding for medical visual question answering.

Dual Attention and Question Categorization-Based Visual Question Answering

Learning visual question answering on controlled semantic noisy labels

Visual question answering model for fruit tree disease decision-making based on multimodal deep learning.

RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering

Deep Residual Weight-Sharing Attention Network With Low-Rank Attention for Visual Question Answering

Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering

Incorporation of Question Segregation Procedure in Visual Question Answering Models

Reducing Vision-Answer biases for Multiple-choice VQA.

Visual Question Answering reasoning with external knowledge based on bimodal graph neural network

An effective spatial relational reasoning networks for visual question answering.

VQAMix: Conditional Triplet Mixup for Medical Visual Question Answering

Attention in Reasoning: Dataset, Analysis, and Modeling.

A Multi-level Mesh Mutual Attention Model for Visual Question Answering

Safety compliance checking of construction behaviors using visual question answering

Answering knowledge-based visual questions via the exploration of Question Purpose

AMAM: An Attention-based Multimodal Alignment Model for Medical Visual Question Answering

Computer Science Diagram Understanding with Topology Parsing

MobiVQA