Abstract

Image retrieval in complicated scene is a challenging task that requires the comprehensive understanding of an image. In this paper, we propose a scene graph based image retrieval framework that combines the scene graph generation with image retrieval and fine tuning the searching results via a dialogue mechanism. Specifically, we proposed an image retrieval oriented scene graph generation model that takes an image and a text describing the image as inputs. The additional text input is used to control the generated scene graph. It provides information for a newly introduced attributes head to better predict the attributes and helps constructing an adjacency matrix at the same time. Graph Convolutional Network is further used to gather information among nodes for precise relation estimation. Moreover, modification on the scene graph can be done by changing the text. Our proposed approach achieves the state-of-the-art performances in both scene graph based image retrieval and scene graph generation in the Visual Genome dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.