Abstract

We propose an approach that enhances arbitrary existing cross-modal image retrieval performance. Most of the cross-modal image retrieval methods mainly focus on direct computation of similarities between a text query and candidate images in an accurate way. However, their retrieval performance is affected by the ambiguity of text queries and the bias of target databases (DBs). Dealing with ambiguous text queries and DBs with bias will lead to accurate cross-modal image retrieval in real-world applications. A re-ranking method using spaces, which can extend arbitrary cross-modal image retrieval methods for enhancing their performance, is proposed in this paper. The proposed method includes two approaches: DB-adaptive re-ranking'' and modality-driven clue information extraction''. Our method estimates clue information that can effectively clarify the desired image from the whole set of a target DB and then receives user's feedback for the estimated information. Furthermore, our method extracts more detailed information of a query text and a target DB by focusing on spaces, and it enables more accurate re-ranking. Our method allows users to reach their desired single image by just answering questions. Experimental results using MSCOCO, Visual Genome and newly introduced datasets including images with a particular object show that the proposed method can enhance the performance of state-of-the-art cross-modal image retrieval methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call