Abstract

Scene graph generation (SGG) task is plagued by insufficient and long-tailed training samples. Thus external text knowledge has been introduced into SGG task, which augments the training dataset. However, rare relations or low-frequency events, such as <dog, standing on, surfboard>, are still hard to be discovered since they also rarely appear in external texts. Addressed these issues, this paper proposes a model-agnostic Bipartite Graph Network with Dual-Group Message Passing (DG-BGN). It extends the relation space of each object pair in training dataset based on the similarity of different object pairs, which cooperates external text knowledge and internal visual information. Specifically, the main framework of DG-BGN consists of three parts, that is, bipartite graph construction (BGC), dual-group message propagation (DG-MP), and pseudo label generation (PLG). For BGC, alignment between external texts and visual dataset is performed to mine information from multimodal resources. Taking object pair as the index of the grouping, dual bipartite graph containing intra/inter-group graph is established to represent predicate label probability distribution of object pair instances. Among them, inter-group bipartite graph is built based on the similarity between object pair instance. For DG-MP, efficient message passing is running on both kinds of bipartite graph through Graph Convolution Neural Networks to refine the graphs, in order to let each object pair instance learn from its similar instances. Finally in PLG, pseudo labels, which are used to train SGG models in fully supervised way, are obtained by aggregating these refined graphs. Systematic experiments on Visual Genome dataset and Conceptual Captions show that our method performs better on discovering rare relations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call