Abstract

Scene graph captures rich semantic information of an image by representing objects and their relationships as nodes and edges of a graph. Recent works have demonstrated that scene graph representation improves the performance of various computer vision tasks such as image retrieval, action recognition, visual question answering. Computationally efficient scene graph generation methods are required to leverage scene graphs in various real-world applications (e.g., autonomous driving, robotics). A typical scene graph generation model consists of two modules: (i) object detector and (ii) scene graph classifier. The scene graph classifier module predicts the object category and object-object relationships. The presence of a quadratic number of potential edges poses a major challenge in the scene graph classification task. Detecting the relationship between each object pair using the traditional approach is computationally intensive and non-scalable. To address this issue, we propose a novel module named EdgeNet that directly predicts the set of relevant edges and helps to prune out a significant number of unrelated object pairs, thereby improving the effectiveness and efficiency of the scene graph classifier. The proposed EdgeNet is a generic module and can be plugged into an existing scene graph classifier. Experimental results highlight the effectiveness and efficiency of the proposed approach on the Visual Genome dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call