Abstract

AbstractScene graph generation refers to the task of identifying the objects and specifically the relationships between the objects from an image. Existing scene graph generation methods generally use the bounding boxes region features of objects to identify the relationships between objects. However, we feel that the overlap region features of two objects may play an important role in fine‐grained relationship identification. In fact, some fine‐grained relationships can only be obtained from the overlap region features of two objects. Therefore, we propose the Multi‐Branch Feature Combination (MFC) module and Overlap Region Transformer (ORT) module to comprehensively obtain the visual features contained in the overlap regions of two objects. Concretely, the MFC module uses deconvolution and multi‐branch dilation convolution to obtain high‐pixels and multi‐receptive field features in the overlap regions. The ORT module uses the vision transformer to obtain the self‐attention of the overlap regions. The joint use of these two modules achieves the mutual complementation of local connectivity properties of convolution and the global connectivity properties of attention. We also design a Geometrical Center Augmented (GCA) module to obtain the relative position information of the geometric centers between two objects, to prevent the problem that only relying on the scale of the overlap region cannot accurately capture the relationship between two objects. Experiments show that our model ORGC (Overlap Region and Geometrical Center), the combination of the MFC module, the ORT module, and the GCA module, can enhance the performance of fine‐grained relation identification. On the Visual Genome dataset, our model outperforms the current state‐of‐the‐art model by 4.4% on the R@50 evaluation metric, reaching a state‐of‐the‐art result of 33.88.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call