ConceptBert: Concept-Aware Representation for Visual Question Answering

François Gardères,Baptiste Abeloos,Maryam Ziaeefard,Freddy Lecue

doi:10.18653/v1/2020.findings-emnlp.44

Abstract

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. A VQA model combines visual and textual features in order to answer questions grounded in an image. Current works in VQA focus on questions which are answerable by direct analysis of the question and image alone. We present a concept-aware algorithm, ConceptBert, for questions which require common sense, or basic factual knowledge from external structured content. Given an image and a question in natural language, ConceptBert requires visual elements of the image and a Knowledge Graph (KG) to infer the correct answer. We introduce a multi-modal representation which learns a joint Concept-Vision-Language embedding inspired by the popular BERT architecture. We exploit ConceptNet KG for encoding the common sense knowledge and evaluate our methodology on the Outside Knowledge-VQA (OK-VQA) and VQA datasets.

Highlights

Visual Question Answering (VQA) was firstly introduced to bridge the gap between natural language processing and image understanding applications in the joint space of vision and language (Malinowski and Fritz, 2014).Most VQA benchmarks compute a question representation using word embedding techniques and Recurrent Neural Networks (RNNs), and a set of object descriptors comprising bounding box coordinates and image features vectors
Adding the Knowledge Graph (KG) embeddings to the model leads to a gain of 11.56% and 7.19% in VQA and Outside Knowledge-VQA (OK-VQA) datasets, respectively
Since we report our results on the validation set, we removed the validation set from the training phase, so that the model only relies on the training set

Summary

Introduction

Visual Question Answering (VQA) was firstly introduced to bridge the gap between natural language processing and image understanding applications in the joint space of vision and language (Malinowski and Fritz, 2014).Most VQA benchmarks compute a question representation using word embedding techniques and Recurrent Neural Networks (RNNs), and a set of object descriptors comprising bounding box coordinates and image features vectors. Word and image representations are fused and fed to a network to train a VQA model. These approaches are practical when no knowledge beyond the visual content is required. Incorporating the external knowledge introduces several advantages. External knowledge and supporting facts can improve the relational representation between the objects detected in the image, or between entities in the question and objects in the image. It provides information on how the answer can be derived from the question. The complexity of the questions can be increased based on the supporting knowledge base

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ConceptBert: Concept-Aware Representation for Visual Question Answering

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 80	License type: cc-by

Similar Papers

Visual Question Answering Using Deep Learning: A Survey and Performance Analysis
Yash Srivastava ... Vaishnav Murali
-
Yash Srivastava, et. al.Yash Srivastava ... Vaishnav Murali
01 Jan 2020
01 Jan 2020

A Survey on Visual Question Answering
Mrinal Banchhor ... Pradeep Singh
-
Mrinal Banchhor, et. al.Mrinal Banchhor ... Pradeep Singh
01 Oct 2021
01 Oct 2021

Visual Question Answering as Reading Comprehension
Hui Li ... Anton Van Den Hengel
-
Hui Li, et. al.Hui Li ... Anton Van Den Hengel
01 Jun 2019
01 Jun 2019

Dual Attention and Question Categorization-Based Visual Question Answering
Aakansha Mishra ... Ashish Anand
IEEE Transactions on Artificial Intelligence | VOL. 4
Aakansha Mishra, et. al.Aakansha Mishra ... Ashish Anand
01 Feb 2023
IEEE Transactions on Artificial Intelligence | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ConceptBert: Concept-Aware Representation for Visual Question Answering

Abstract

Highlights

Summary

Talk to us

Similar Papers