Abstract

As an important and challenging problem in computer vision, scene graph generation (SGG) aims to find out the underlying semantic relationships among objects from a given image for scene understanding. Usually, prevalent SGG approaches adopt a learning pipeline with the assumption that there exists only a single relationship for a particular object pair. Considering the common phenomenon that a pair of objects can be attached by multiple relationships, we propose a multi-label scene graph generation pipeline with multi-grained features (MLMG-SGG), which formulates the relationship detection as a multi-label classification problem during training while generating multigraphs at inference time. In order to better model the fine-grained relationships, the proposed pipeline encodes the feature representation of SGG on different spatial scales by a specially designed Multi-Grained Module (MGM), resulting in the multi-grained (i.e., object-level and region-level) features of objects. Experimental results over the benchmark dataset demonstrate the significant performance gain of the proposed pipeline used as a plug-in for the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call