Visual Question Answering Using Deep Learning: A Survey and Performance Analysis

Yash Srivastava,Shiv Ram Dubey,Snehasis Mukherjee,Vaishnav Murali

doi:10.1007/978-981-16-1092-9_7

Abstract

AbstractThe Visual Question Answering (VQA) task combines challenges for processing data with both Visual and Linguistic processing, to answer basic ‘common sense’ questions about given images. Given an image and a question in natural language, the VQA system tries to find the correct answer to it using visual elements of the image and inference gathered from textual questions. In this survey, we cover and discuss the recent datasets released in the VQA domain dealing with various types of question-formats and robustness of the machine-learning models. Next, we discuss about new deep learning models that have shown promising results over the VQA datasets. At the end, we present and discuss some of the results computed by us over the vanilla VQA model, Stacked Attention Network and the VQA Challenge 2017 winner model. We also provide the detailed analysis along with the challenges and future research directions.KeywordsVisual Question AnsweringArtificial intelligenceHuman computer interactionDeep learningCNNLSTM

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Visual Question Answering Using Deep Learning: A Survey and Performance Analysis

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

VQAR: Review on Information Retrieval Techniques based on Computer Vision and Natural Language Processing
Shivangi Modi ... Dhatri Pandya
-
Shivangi Modi, et. al.Shivangi Modi ... Dhatri Pandya
01 Mar 2019
01 Mar 2019

A Survey on Visual Question Answering
Mrinal Banchhor ... Pradeep Singh
-
Mrinal Banchhor, et. al.Mrinal Banchhor ... Pradeep Singh
01 Oct 2021
01 Oct 2021

ConceptBert: Concept-Aware Representation for Visual Question Answering
François Gardères ... Freddy Lecue
-
François Gardères, et. al.François Gardères ... Freddy Lecue
01 Jan 2020
01 Jan 2020

Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm
Mehrdad Alizadeh ... Barbara Di Eugenio
International Journal of Semantic Computing | VOL. 14
Mehrdad Alizadeh, et. al.Mehrdad Alizadeh ... Barbara Di Eugenio
01 Jun 2020
International Journal of Semantic Computing | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Visual Question Answering Using Deep Learning: A Survey and Performance Analysis

Abstract

Talk to us

Similar Papers