Abstract

The ability of a computer system to be able to understand surroundings and elements and to think like a human being to process the information has always been the major point of focus in the field of Computer Science. One of the ways to achieve this artificial intelligence is Visual Question Answering. Visual Question Answering (VQA) is a trained system which can answer the questions associated to a given image in Natural Language. VQA is a generalized system which can be used in any image-based scenario with adequate training on the relevant data. This is achieved with the help of Neural Networks, particularly Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). In this study, we have compared different approaches of VQA, out of which we are exploring CNN based model. With the continued progress in the field of Computer Vision and Question answering system, Visual Question Answering is becoming the essential system which can handle multiple scenarios with their respective data.

Highlights

  • Artificial Intelligence (AI) has always been seen as a robotic system having the ability to think like a human, but AI can be technically distributed into parts such as Natural Language Processing (NLP), Computer Vision, Image Processing, and Text Processing

  • We propose an approach of implementing Visual Question Answering (VQA) with the help of Convolutional Neural Networks and Recurrent Neural Networks with the inclusion of external knowledge of the images of the dataset

  • There have been some approaches to tackle the challenge of VQA, mainly with the help of Artificial Neural Networks Convolutional Neural Network (Qi Wu 2017) and Recurrent Neural Network (Iqbal Chowdhury et al)

Read more

Summary

Introduction

Artificial Intelligence (AI) has always been seen as a robotic system having the ability to think like a human, but AI can be technically distributed into parts such as Natural Language Processing (NLP), Computer Vision, Image Processing, and Text Processing. An answer is generated in Natural Language As this task consists of two different parts of processing, individual processing of image and question and image-feature mapping must be done accurately to achieve the desired result. This is dependent on the way of training the dataset and the choice of properly fine-tuned Neural Networks. The use of external knowledge helps the system to properly map the image information with its corresponding question-answer pair by providing additional details of the features in the image. This helps in decreasing random answers irrelevant of the image or question

Related Work
Datasets
Findings
Discussion and Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.