Abstract
Humans tend to understand image scene by recognizing visual elements, then conjecturing and inferring based on them, hence are able to search relevant images. In this paper, we concern about the problem of complex image retrieval by reasoning image dense captions, which is similar to the way of human perception for searching images. Specifically, we transform the problem of complex image retrieval into a dense captioning and scene graph matching issue by using structured language descriptions for retrieval. Experimental results on a novel proposed large-scale content-based image retrieval dataset demonstrate the rationality and effectiveness of our method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.