Multi-Level Knowledge Injecting for Visual Commonsense Reasoning

Zhang Wen,Yuxin Peng

doi:10.1109/tcsvt.2020.2991866

Abstract

When glancing at an image, human can infer what is hidden in the image beyond what is visually obvious, such as objects' functions, people's intents and mental states. However, such a visual reasoning paradigm is tremendously difficult for computer, requiring knowledge about how the world works. To address this issue, we propose Commonsense Knowledge based Reasoning Model (CKRM) to acquire external knowledge to support Visual Commonsense Reasoning (VCR) task, where the computer is expected to answer challenging visual questions. Our key ideas are: (1) To bridge the gap between recognition-level and cognition-level image understanding, we inject external commonsense knowledge via multi-level knowledge transfer network, achieving cell-level, layer-level and attention-level joint information transfer. It can effectively capture knowledge from different perspectives, and perceive common sense of human in advance. (2) To further promote image understanding at cognitive level, we propose a knowledge based reasoning approach, which can relate the transferred knowledge to visual content and compose the reasoning cues to derive the final answer. Experiments are conducted on the challenging visual commonsense reasoning dataset VCR to verify the effectiveness of our proposed CKRM approach, which can significantly improve reasoning performance and achieve the state-of-the-art accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-Level Knowledge Injecting for Visual Commonsense Reasoning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology

Lead the way for us

Journal: IEEE Transactions on Circuits and Systems for Video Technology	Publication Date: May 5, 2020
Citations: 70

Similar Papers

A survey of neurosymbolic visual reasoning with scene graphs and common sense knowledge
M Jaleed Khan ... John G Breslin
Neurosymbolic Artificial Intelligence | VOL. -
M Jaleed Khan, et. al.M Jaleed Khan ... John G Breslin
13 May 2024
Neurosymbolic Artificial Intelligence | VOL. -

KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for visual commonsense reasoning
Dandan Song ... Lejian Liao
Knowledge-Based Systems | VOL. 230
Dandan Song, et. al.Dandan Song ... Lejian Liao
19 Aug 2021
Knowledge-Based Systems | VOL. 230

Webly Supervised Knowledge Embedding Model for Visual Reasoning
Wenbo Zheng ... Chao Gou
-
Wenbo Zheng, et. al.Wenbo Zheng ... Chao Gou
01 Jun 2020
01 Jun 2020

The Contribution of Increased Gamma Band Connectivity to Visual Non-Verbal Reasoning in Autistic Children: A MEG Study.
Natsumi Takesaki ... Laurent Mottron
PLOS ONE | VOL. 11
Natsumi Takesaki, et. al.Natsumi Takesaki ... Laurent Mottron
15 Sep 2016
PLOS ONE | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Level Knowledge Injecting for Visual Commonsense Reasoning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology