A semantic model for cross-modal and multi-modal retrieval

Liang Xie,Yansheng Lu,Peng Pan

doi:10.1145/2461466.2461497

Abstract

In this paper, a semantic model for cross-modal and multi-modal retrieval is studied. We assume that the semantic correlation of multimedia data from different modalities can be depicted in a probabilistic generation framework. Media data from different modalities can be generated by the same semantic concepts, and the generation process of each media data is conditional independent under the semantic concepts. The semantic generation model (SGM) for cross-modal and multi-modal analysis is proposed based on this assumption. We study two types of methods: direct method Gaussian distribution and indirect method random forest, to estimate the semantic conditional distribution of SGM. Then methods for cross-modal and multi-modal retrieval are derived from SGM. Experimental results show that SGM based methods for cross-modal retrieval improve the accuracy over the state-of-the-art cross-modal method, but don't increase the time consuming, and the SGM multimodal retrieval methods also outperform traditional methods in image retrieval. Moreover, indirect SGM based method outperforms direct SGM method in the two types of retrieval, which proves that indirect SGM can better describe the semantic distribution.

Full Text