Scene graph generation (SGG) aims to perceive objects and their relations in images, which can bridge the gap between upstream detection tasks and downstream high-level visual understanding tasks. For scene graph generation models, over-fitting the head predicates can lead to bias in the generated scene graph, which has become a consensus. A series of debiasing methods have been proposed to solve the problem. However, some existing debiasing SGG methods have a tendency to over-fit the tail predicates, which is another type of bias. In order to eliminate the one-way over-fitting of head or tail predicates, this paper proposes a balanced relation prediction (BRP) module which is model-agnostic and compatible with existed re-balancing methods. Moreover, because the relation prediction is based on object feature representation, this paper proposes a scene adaptive context fusion (SACF) module to refine the object feature representation. Specifically, SACF models the context based on a chain structure, where the order of objects in the chain structure is adaptively arranged according to the scene content, achieving visual information fusion that adapts to the scene where the objects are located. Experiments on VG and GQA datasets show that the proposed method achieves competitive results on the comprehensive metric across R@K and mR@K.
Read full abstract