Visual Question Answering using Convolutional Neural Networks

K P Moholkar, Et Al

doi:10.17762/turcomat.v12i1s.1602

Abstract

The ability of a computer system to be able to understand surroundings and elements and to think like a human being to process the information has always been the major point of focus in the field of Computer Science. One of the ways to achieve this artificial intelligence is Visual Question Answering. Visual Question Answering (VQA) is a trained system which can answer the questions associated to a given image in Natural Language. VQA is a generalized system which can be used in any image-based scenario with adequate training on the relevant data. This is achieved with the help of Neural Networks, particularly Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). In this study, we have compared different approaches of VQA, out of which we are exploring CNN based model. With the continued progress in the field of Computer Vision and Question answering system, Visual Question Answering is becoming the essential system which can handle multiple scenarios with their respective data.

Highlights

Artificial Intelligence (AI) has always been seen as a robotic system having the ability to think like a human, but AI can be technically distributed into parts such as Natural Language Processing (NLP), Computer Vision, Image Processing, and Text Processing
We propose an approach of implementing Visual Question Answering (VQA) with the help of Convolutional Neural Networks and Recurrent Neural Networks with the inclusion of external knowledge of the images of the dataset
There have been some approaches to tackle the challenge of VQA, mainly with the help of Artificial Neural Networks Convolutional Neural Network (Qi Wu 2017) and Recurrent Neural Network (Iqbal Chowdhury et al)

Summary

Introduction

Artificial Intelligence (AI) has always been seen as a robotic system having the ability to think like a human, but AI can be technically distributed into parts such as Natural Language Processing (NLP), Computer Vision, Image Processing, and Text Processing. An answer is generated in Natural Language As this task consists of two different parts of processing, individual processing of image and question and image-feature mapping must be done accurately to achieve the desired result. This is dependent on the way of training the dataset and the choice of properly fine-tuned Neural Networks. The use of external knowledge helps the system to properly map the image information with its corresponding question-answer pair by providing additional details of the features in the image. This helps in decreasing random answers irrelevant of the image or question

Related Work

Datasets

Findings

Discussion and Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Visual Question Answering using Convolutional Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Turkish Journal of Computer and Mathematics Education (TURCOMAT)

Lead the way for us

Journal: Turkish Journal of Computer and Mathematics Education (TURCOMAT)	Publication Date: Apr 11, 2021
License type: cc-by

Similar Papers

Learning Convolutional Text Representations for Visual Question Answering
Zhengyang Wang ... Shuiwang Ji
-
Zhengyang Wang, et. al.Zhengyang Wang ... Shuiwang Ji
07 May 2018
07 May 2018

Dual self-attention with co-attention networks for visual question answering
Yun Liu ... Zhoujun Li
Pattern Recognition | VOL. 117
Yun Liu, et. al.Yun Liu ... Zhoujun Li
09 Apr 2021
Pattern Recognition | VOL. 117

Visual Question Answering with External Knowledge
Santhosh Voruganti ... Sravanthi M
SSRN Electronic Journal | VOL. -
Santhosh Voruganti, et. al.Santhosh Voruganti ... Sravanthi M
01 Jan 2020
SSRN Electronic Journal | VOL. -

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge.
Qi Wu ... Peng Wang
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 40
Qi Wu, et. al.Qi Wu ... Peng Wang
26 May 2017
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 40

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Visual Question Answering using Convolutional Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Turkish Journal of Computer and Mathematics Education (TURCOMAT)