Transformation Driven Visual Reasoning

Xin Hong,Yanyan Lan,Jiafeng Guo,Liang Pang,Xueqi Cheng

doi:10.1109/cvpr46437.2021.00683

Abstract

This paper defines a new visual reasoning paradigm by introducing an important factor, i.e. transformation. The motivation comes from the fact that most existing visual reasoning tasks, such as CLEVR in VQA, are solely defined to test how well the machine understands the concepts and relations within static settings, like one image. We argue that this kind of state driven visual reasoning approach has limitations in reflecting whether the machine has the ability to infer the dynamics between different states, which has been shown as important as state-level reasoning for human cognition in Piaget’s theory. To tackle this problem, we propose a novel transformation driven visual reasoning task. Given both the initial and final states, the target is to infer the corresponding single-step or multi-step transformation, represented as a triplet (object, attribute, value) or a sequence of triplets, respectively. Following this definition, a new dataset namely TRANCE is constructed on the basis of CLEVR, including three levels of settings, i.e. Basic (single-step transformation), Event (multi-step transformation), and View (multi-step transformation with variant views). Experimental results show that the state-of-the-art visual reasoning models perform well on Basic, but are still far from human-level intelligence on Event and View. We believe the proposed new paradigm will boost the development of machine visual reasoning. More advanced methods and real data need to be investigated in this direction. The resource of TVR is available at https://hongxin2019.github.io/TVR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Transformation Driven Visual Reasoning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Visual Reasoning: From State to Transformation.
Xin Hong ... Yanyan Lan
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 45
Xin Hong, et. al.Xin Hong ... Yanyan Lan
01 Sep 2023
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 45

The Contribution of Increased Gamma Band Connectivity to Visual Non-Verbal Reasoning in Autistic Children: A MEG Study.
Natsumi Takesaki ... Laurent Mottron
PLOS ONE | VOL. 11
Natsumi Takesaki, et. al.Natsumi Takesaki ... Laurent Mottron
15 Sep 2016
PLOS ONE | VOL. 11

A survey of neurosymbolic visual reasoning with scene graphs and common sense knowledge
M Jaleed Khan ... John G Breslin
Neurosymbolic Artificial Intelligence | VOL. -
M Jaleed Khan, et. al.M Jaleed Khan ... John G Breslin
13 May 2024
Neurosymbolic Artificial Intelligence | VOL. -

Exploring the potential role of visual reasoning tasks among inexperienced solvers
Intisar Natsheh ... Ronnie Karsenty
ZDM | VOL. 46
Intisar Natsheh, et. al.Intisar Natsheh ... Ronnie Karsenty
30 Oct 2013
ZDM | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transformation Driven Visual Reasoning

Abstract

Talk to us

Similar Papers