Overcoming the Limitations of Learning-Based VQA for Counting Questions with Zero-Shot Learning

A Lubna,Saidalavi Kalady

doi:10.1142/s0218213024500192

Abstract

Visual question answering (VQA) research has garnered increasing attention in recent years. It is considered a visual Turing test because it requires a computer to respond to textual questions based on an image. Expertise in computer vision, natural language processing, knowledge understanding, and reasoning is required to solve the problem of VQA. Most techniques employed for VQA consist of models that are developed to learn the combination of image and question features along with the expected answer. The techniques chosen for image and question feature extraction and combining the features change with each model. This method of teaching a model of the question–answer pattern is ineffective for queries that involve counting and reasoning. This approach also requires considerable resources and large datasets for the training. The general VQA datasets feature a restricted number of items as responses to counting questions ([Formula: see text]), and the distribution of the answers is not uniform. To investigate these issues in VQA, we created synthetic datasets that could be modified to adjust the number of objects in the image and the amount of occlusion. Specifically, a zero-shot learning VQA system was devised for counting-related questions that provide answers by analyzing the output of an object detector and the query keywords. Using synthetic datasets, our model generated 100% correct results. Testing on the benchmark datasets task directed image understanding challenge (TDIUC) and TallyQA-simple indicated that the proposed model matched the performance of the learning-based baseline models. This methodology can be used efficiently for counting VQA questions confined to certain domains when the number of items to be counted is significant.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Overcoming the Limitations of Learning-Based VQA for Counting Questions with Zero-Shot Learning

Abstract

Talk to us

Similar Papers

More From: International Journal on Artificial Intelligence Tools

Lead the way for us

Similar Papers

VQAR: Review on Information Retrieval Techniques based on Computer Vision and Natural Language Processing
Shivangi Modi ... Dhatri Pandya
-
Shivangi Modi, et. al.Shivangi Modi ... Dhatri Pandya
01 Mar 2019
01 Mar 2019

Towards Open Ended and Free Form Visual Question Answering: Modeling VQA as a Factoid Question Answering Problem
Abhishek Narayanan ... S Natarajan
-
Abhishek Narayanan, et. al.Abhishek Narayanan ... S Natarajan
01 Jan 2020
01 Jan 2020

Optimal Image Feature Ranking and Fusion for Visual Question Answering
Sruthy Manmadhan ... Binsu C Kovoor
-
Sruthy Manmadhan, et. al.Sruthy Manmadhan ... Binsu C Kovoor
09 Sep 2020
09 Sep 2020

Vιsual question answering models Evaluation
S. Sarath ... J. Amudha
-
S. Sarath, et. al.S. Sarath ... J. Amudha
01 Jun 2020
01 Jun 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Overcoming the Limitations of Learning-Based VQA for Counting Questions with Zero-Shot Learning

Abstract

Talk to us

Similar Papers

More From: International Journal on Artificial Intelligence Tools