Abstract

Visual Question Answering (VQA) is an inspiring task that includes two major fields of AI namely Natural Language Processing and Computer Vision in which an image is given with a question in natural language and the task is to find the answer to that question. The Visual Question Answering (VQA) task involves challenges in which we have to process the data with visual as well as linguistic processing to find the answer of basic common sense questions related to a given image. This task requires reasoning capabilities on visual features and objects of the image along with that general knowledge to predict the correct answer to the given question. In this survey paper, we will discuss various state-of-the-art methodologies, algorithms, and datasets for VQA and breakthrough timeline in Visual Question Answering in detail. Some common techniques of combining recurrent and convolutional neural networks with LSTM or GRU in order to map questions as well as the image to a common interface are also explored. Each dataset contains questions of a distinct level of complexity. Different types of reasoning and capabilities are required to solve various complex images. In this survey, we will also cover current VQA datasets released in this field along with various types of question patterns and machine learning (ML)models. Finally, DL models which shown excellent performance on various benchmark VQA datasets are discussed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.