Context-aware Scene Graph Generation with Seq2Seq Transformers

Yichao Lu,Boris Knyazev,Maksims Volkovs,Guangwei Yu,Jason Chang,Himanshu Rai,Graham W Taylor,Shashank Shekhar

doi:10.1109/iccv48922.2021.01563

Abstract

Scene graph generation is an important task in computer vision aimed at improving the semantic understanding of the visual world. In this task, the model needs to detect objects and predict visual relationships between them. Most of the existing models predict relationships in parallel assuming their independence. While there are different ways to capture these dependencies, we explore a conditional approach motivated by the sequence-to-sequence (Seq2Seq) formalism. Different from the previous research, our proposed model predicts visual relationships one at a time in an autoregressive manner by explicitly conditioning on the already predicted relationships. Drawing from translation models in NLP, we propose an encoder-decoder model built using Transformers where the encoder captures global context and long range interactions. The decoder then makes sequential predictions by conditioning on the scene graph constructed so far. In addition, we introduce a novel reinforcement learning-based training strategy tailored to Seq2Seq scene graph generation. By using a self-critical policy gradient training approach with Monte Carlo search we directly optimize for the (mean) recall metrics and bridge the gap between training and evaluation. Experimental results on two public benchmark datasets demonstrate that our Seq2Seq learning approach achieves strong empirical performance, outperforming previous state-of-the-art, while remaining efficient in terms of training and inference time. Full code for this work is available here: https://github.com/layer6ai-labs/SGG-Seq2Seq.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Context-aware Scene Graph Generation with Seq2Seq Transformers

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

NeuSyRE: Neuro-symbolic visual understanding and reasoning framework based on scene graph enrichment
M Jaleed Khan ... Edward Curry
Semantic Web | VOL. -
M Jaleed Khan, et. al.M Jaleed Khan ... Edward Curry
13 Dec 2023
Semantic Web | VOL. -

Exploring Pairwise Spatial Relationships for Actions Recognition and Scene Graph Generation
Anfel Amirat ... Lamine Benrais
-
Anfel Amirat, et. al.Anfel Amirat ... Lamine Benrais
01 Jan 2023
01 Jan 2023

EdgeNet for efficient scene graph classification
Vivek B.S ... Arpan Pal
-
Vivek B.S, et. al.Vivek B.S ... Arpan Pal
18 Jul 2022
18 Jul 2022

Balanced scene graph generation assisted by an additional biased predictor
Wenbin Wang ... Xilin Chen
SCIENTIA SINICA Informationis | VOL. 52
Wenbin Wang, et. al.Wenbin Wang ... Xilin Chen
01 Nov 2022
SCIENTIA SINICA Informationis | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Context-aware Scene Graph Generation with Seq2Seq Transformers

Abstract

Talk to us

Similar Papers