Abstract

Image search reranking which aims to improve the text-based image search results with the help of other cues has grown into a hot research topic. Most existing reranking methods only focus on image visual cues. However, the visual cues cannot always guarantee to provide enough information for the reranking process. Although, some approaches try to fuse multiple image cues for reranking, they do not or weakly exploit the relationships among multiple image cues. In this paper,we present a novel image reranking framework--Joint-Rerank which considers multiple modalities of images (or multiple cues) jointly as interdependent attributes of an image entity. Joint-Rerank models the images as a multigraph where each image is a node with multimodal attributes (textual and visual cues) and the parallel edges between nodes measure both image intra-modal and inter-modal similarities. Besides, each node has a "self-consistency" that measures how much the multiple modalities of an image may be consistent. To solve the reranking problem, we first degenerate the multigraph into a new complete graph, and then employ a random walk on the degenerated graph to propagate the relevance scores of each node. Finally, the relevance scores of multiple modalities are fused to rank the images. Moreover, in Joint-Rerank, "cross-modal" walk is possible, i.e., a surfer can jump from one image to another following both intra-modal and inter-modal links. In this framework, we propose two methods: Sym-Joint-Rerank and Asym-Joint-Rerank which use different approaches to measure the inter-modal similarities between two nodes. Experimental results on a large web queries dataset which contains 353 image search queries show that both of them are superior or highly competitive to several state-of-the-art reranking algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call