Image Captioning and Visual Question Answering Based on Attributes and External Knowledge.

Qi Wu,Chunhua Shen,Anton Van Den Hengel,Anthony Dick,Peng Wang

doi:10.1109/tpami.2017.2708709

Abstract

Much of the recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to progress directly from image features to text. In this paper we first propose a method of incorporating high-level concepts into the successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art in both image captioning and visual question answering. We further show that the same mechanism can be used to incorporate external knowledge, which is critically important for answering high level visual questions. Specifically, we design a visual question answering model that combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions. It particularly allows questions to be asked where the image alone does not contain the information required to select the appropriate answer. Our final model achieves the best reported results for both image captioning and visual question answering on several of the major benchmark datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: May 26, 2017
Citations: 407

Similar Papers

What Value Do Explicit High Level Concepts Have in Vision to Language Problems?
Qi Wu ... Anton Van Den Hengel
-
Qi Wu, et. al.Qi Wu ... Anton Van Den Hengel
01 Jun 2016
01 Jun 2016

Image Captioning using Convolutional Neural Networks and Recurrent Neural Network
Rachel Calvin ... Shravya Suresh
-
Rachel Calvin, et. al.Rachel Calvin ... Shravya Suresh
02 Apr 2021
02 Apr 2021

Image Captioning using Convolutional Neural Networks and Long Short Term Memory Cells
Hitoishi Das
International Journal of Recent Technology and Engineering (IJRTE) | VOL. 11
Hitoishi DasHitoishi Das
30 May 2022
International Journal of Recent Technology and Engineering (IJRTE) | VOL. 11

Multi-source Multi-level Attention Networks for Visual Question Answering
Dongfei Yu ... Jianlong Fu
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 15
Dongfei Yu, et. al.Dongfei Yu ... Jianlong Fu
30 Apr 2019
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence