Cross-media Relevance Computation for Multimedia Retrieval

Jianfeng Dong

doi:10.1145/3123266.3123963

Abstract

In this paper, we summarize our works for cross-media retrieval where the queries and retrieval content are of different media types. We study cross-media retrieval in the context of two applications, i.e., ~image retrieval by textual queries, and sentence retrieval by visual queries, two popular applications in multimedia retrieval. For image retrieval by textual queries, we proposetext2image which converts computing cross-media relevance between images and textual queries to comparing the visual similarity among images.We also proposecross-media relevance fusion, a conceptual framework that combines multiple cross-media relevance estimators.These two techniques have resulted in a winning entry in the Microsoft Image Retrieval Challenge at ACM MM 2015. For sentence retrieval by visual queries, we propose to compute cross-media relevance in a visual space exclusively. We contributeWord2VisualVec, a deep neural network architecture that learns to predict a visual feature representation from textual input. With proposedWord2VisualVec model, we won the Video to Text Description task at TRECVID 2016.

Full Text