Optimal Image Feature Ranking and Fusion for Visual Question Answering

Sruthy Manmadhan,Binsu C Kovoor

doi:10.1007/978-981-15-5788-0_10

Abstract

Visual Question Answering (VQA) is a moderately new and challenging multi-modal task, which endeavors to discover an answer for a given pair of an image and a relating question. This AI-complete task gains attraction from numerous researchers from the areas computer vision (CV) and natural language processing (NLP) due to its various potential applications. The general flow of VQA algorithms consists of image feature extraction, question feature extraction and joint comprehension of these two to generate an appropriate answer. Existing VQA systems did not pay attention to input feature extraction, but only celebrated different ways of multi-modal embedding. This paper proposes to improve the task of VQA by feature-level fusion of visual information. The goal of feature fusion is to consolidate relevant information from two or more feature vectors into a solitary one with additional discriminative power. Unlike simple concatenation, this paper uses discriminative correlation analysis (DCA) for fusion, which is the only method that incorporates the class structure into the feature-level fusion. Since the VQA systems are generally modeled as classification systems by treating the correct answers as classes, class-specific DCA suits well here. The newly created fused feature vectors are close to the right answers and thus raise the role of image understanding in VQA. The experimental results show the effectiveness of the new approach on DAQUAR dataset with mutual information (MI) as an evaluation metric.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimal Image Feature Ranking and Fusion for Visual Question Answering

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

VQAR: Review on Information Retrieval Techniques based on Computer Vision and Natural Language Processing
Shivangi Modi ... Dhatri Pandya
-
Shivangi Modi, et. al.Shivangi Modi ... Dhatri Pandya
01 Mar 2019
01 Mar 2019

Counting in Visual Question Answering: Methods, Datasets, and Future Work
Tesfayee Meshu Welde ... Lejian Liao
International Journal of Image and Graphics | VOL. -
Tesfayee Meshu Welde, et. al.Tesfayee Meshu Welde ... Lejian Liao
20 Oct 2023
International Journal of Image and Graphics | VOL. -

Improving Automatic VQA Evaluation Using Large Language Models
Oscar Mañas ... Aishwarya Agrawal
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Oscar Mañas, et. al.Oscar Mañas ... Aishwarya Agrawal
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

A survey of methods, datasets and evaluation metrics for visual question answering
Himanshu Sharma ... Anand Singh Jalal
Image and Vision Computing | VOL. 116
Himanshu Sharma, et. al.Himanshu Sharma ... Anand Singh Jalal
15 Oct 2021
Image and Vision Computing | VOL. 116

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimal Image Feature Ranking and Fusion for Visual Question Answering

Abstract

Talk to us

Similar Papers