A Novel Approach on Visual Question Answering by Parameter Prediction using Faster Region Based Convolutional Neural Network

Sudan Jha,Raghvendra Kumar,Anirban Dey,Vijender Kumar-Solanki

doi:10.9781/ijimai.2018.08.004

Abstract

Visual Question Answering (VQA) is a stimulating process in the field of Natural Language Processing (NLP) and Computer Vision (CV). In this process machine can find an answer to a natural language question which is related to an image. Question can be open-ended or multiple choice. Datasets of VQA contain mainly three components; questions, images and answers. Researchers overcome the VQA problem with deep learning based architecture that jointly combines both of two networks i.e. Convolution Neural Network (CNN) for visual (image) representation and Recurrent Neural Network (RNN) with Long Short Time Memory (LSTM) for textual (question) representation and trained the combined network end to end to generate the answer. Those models are able to answer the common and simple questions that are directly related to the image’s content. But different types of questions need different level of understanding to produce correct answers. To solve this problem, we use faster Region based-CNN (R-CNN) for extracting image features with an extra fully connected layer whose weights are dynamically obtained by LSTMs cell according to the question. We claim in this paper that a single R-CNN architecture can solve the problems related to VQA by modifying weights in the parameter prediction layer. Authors trained the network end to end by Stochastic Gradient Descent (SGD) using pretrained faster R-CNN and LSTM and tested it on benchmark datasets of VQA.

Highlights

UNDERSTANDING an image by the help of computer vision or image processing technique is a complex procedure studied in the two last eras
Deep learning architectures constructed by knowledge artificial neural networks have enhanced visual image understanding [1, 2, 3].Object recognition from an image is done by Convolutional Neural Network (CNN)
This paper focuses on a deep learning based model for both open ended and multiple choice questions

Summary

Introduction

UNDERSTANDING an image by the help of computer vision or image processing technique is a complex procedure studied in the two last eras. The researchers all over the world applied the process of image understanding to solve the problem of Visual Question Answering by the machine learning. Deep learning architectures constructed by knowledge artificial neural networks have enhanced visual image understanding [1, 2, 3].Object recognition from an image is done by Convolutional Neural Network (CNN). CNN performs feature representation of the image and LSTMs process the representation of question and answer. The researchers directly combined both networks and trained end to end to generate the answer [20, 21]. This kind of approach is able to answer the common and simple questions that are related to the image’s content i.e. This kind of approach is able to answer the common and simple questions that are related to the image’s content i.e. ‘What is the Regular Issue

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Interactive Multimedia and Artificial Intelligence	Publication Date: Jan 1, 2019
Citations: 12	License type: cc-by

R Discovery Prime

R Discovery Prime

A Novel Approach on Visual Question Answering by Parameter Prediction using Faster Region Based Convolutional Neural Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Interactive Multimedia and Artificial Intelligence

Lead the way for us

Similar Papers

A method of water change monitoring in remote image time series based on long short time memory
Qiyuan Yang ... Tiaojun Zeng
Remote Sensing Letters | VOL. 12
Qiyuan Yang, et. al.Qiyuan Yang ... Tiaojun Zeng
02 Jan 2020
Remote Sensing Letters | VOL. 12

The Role of CNN and RNN in the Classification of Audio Music Genres
Mohsin Ashraf ... Muhammad Atif
VFAST Transactions on Software Engineering | VOL. 10
Mohsin Ashraf, et. al.Mohsin Ashraf ... Muhammad Atif
30 Jun 2022
VFAST Transactions on Software Engineering | VOL. 10

Multi-scale relation reasoning for multi-modal Visual Question Answering
Yirui Wu ... Shaohua Wan
Signal Processing: Image Communication | VOL. 96
Yirui Wu, et. al.Yirui Wu ... Shaohua Wan
14 May 2021
Signal Processing: Image Communication | VOL. 96

A Series of Models based on Long Short Time Memory for Temperature Prediction
Tingxi Chen
Highlights in Science, Engineering and Technology | VOL. 39
Tingxi ChenTingxi Chen
01 Apr 2023
Highlights in Science, Engineering and Technology | VOL. 39

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Novel Approach on Visual Question Answering by Parameter Prediction using Faster Region Based Convolutional Neural Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Interactive Multimedia and Artificial Intelligence