Cross-Media Semantic Matching via Sparse Neural Network Pre-trained by Deep Restricted Boltzmann Machines

Bin Zhang,Zhenhua Wang,Huaxiang Zhang,Hongchen Wu,Xiao Dong,Jiande Sun

doi:10.1007/978-981-10-8530-7_27

Abstract

Cross-media retrieval arouses considerable attentions and becomes a more and more worthwhile research direction in the domain of information retrieval. Different from many related works which perform retrieval by mapping heterogeneous data into a common representation subspace using a couple of projection matrices, we input multi-modal media data into a model of neural network which utilize a deep sparse neural network pre-trained by restricted Boltzmann machines and output their semantic understanding for semantic matching (RSNN-SM). Consequently, the heterogeneous modality data are represented by their top-level semantic outputs, and cross-media retrieval is performed by measuring their semantic similarities. Experimental results on several real-world datasets show that, RSNN-SM obtains the best performance and outperforms the state-of-the-art approaches.

Full Text