Bridging Visual and Textual Semantics: Towards Consistency for Unbiased Scene Graph Generation.

Ruonan Zhang,Gaoyun An,Yiqing Hao,Dapeng Oliver Wu

doi:10.1109/tpami.2024.3389030

Abstract

Scene Graph Generation (SGG) aims to detect visual relationships in an image. However, due to long-tailed bias, SGG is far from practical. Most methods depend heavily on the assistance of statistics co-occurrence to generate a balanced dataset, so they are dataset-specific and easily affected by noises. The fundamental cause is that SGG is simplified as a classification task instead of a reasoning task, thus the ability capturing the fine-grained details is limited and the difficulty in handling ambiguity is increased. By imitating the way of dual process in cognitive psychology, a Visual-Textual Semantics Consistency Network (VTSCN) is proposed to model the SGG task as a reasoning process, and relieve the long-tailed bias significantly. In VTSCN, as the rapid autonomous process (Type1 process), we design a Hybrid Union Representation (HUR) module, which is divided into two steps for spatial awareness and working memories modeling. In addition, as the higher order reasoning process (Type2 process), a Global Textual Semantics Modeling (GTS) module is designed to individually model the textual contexts with the word embeddings of pairwise objects. As the final associative process of cognition, a Heterogeneous Semantics Consistency (HSC) module is designed to balance the type1 process and the type2 process. Lastly, our VTSCN raises a new way for SGG model design by fully considering human cognitive process. Experiments on Visual Genome, GQA and PSG datasets show our method is superior to state-of-the-art methods, and ablation studies validate the effectiveness of our VTSCN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bridging Visual and Textual Semantics: Towards Consistency for Unbiased Scene Graph Generation.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence

Lead the way for us

Similar Papers

Attention redirection transformer with semantic oriented learning for unbiased scene graph generation
Ruonan Zhang ... Qiuqi Ruan
Pattern Recognition | VOL. 158
Ruonan Zhang, et. al.Ruonan Zhang ... Qiuqi Ruan
24 Sep 2024
Pattern Recognition | VOL. 158

Scene Graph Generation With External Knowledge and Image Reconstruction
Jiuxiang Gu ... Handong Zhao
-
Jiuxiang Gu, et. al.Jiuxiang Gu ... Handong Zhao
01 Jun 2019
01 Jun 2019

More Knowledge, Less Bias: Unbiasing Scene Graph Generation with Explicit Ontological Adjustment
Zhanwen Chen ... Saed Rezayi
-
Zhanwen Chen, et. al.Zhanwen Chen ... Saed Rezayi
01 Jan 2023
01 Jan 2023

Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning
Muhammad Jaleed Khan ... Edward Curry
-
Muhammad Jaleed Khan, et. al.Muhammad Jaleed Khan ... Edward Curry
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bridging Visual and Textual Semantics: Towards Consistency for Unbiased Scene Graph Generation.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence