Distributed cross-media multiple binary subspace learning

Xueyi Zhao,Zhongfei Zhang,Chenyi Zhang

doi:10.1007/s13735-015-0081-4

Abstract

Due to the ubiquitous existence of large-scale data in today’s real-world applications, including learning on cross-media data, we propose a semi-supervised learning method, named Multiple Binary Subspace Regression, for cross-media data concept detection. In order to mine the common features among the data with multiple modalities, we project the original cross-media data onto the same subspace-level representation simultaneously by mapping to the corresponding subspaces for dimensionality reduction. All the subspaces are set to be binary, which only involve the addition operations and omit the multiplication operations in the subsequent computation owing to the good property of the binary values. The dimensionality reduction to a binary subspace and the concept detection on this subspace are also optimized simultaneously leading to a semi-supervised model. For dealing with large-scale data, our learning method is easily implemented to run in a MapReduce-based Hadoop system. Empirical studies demonstrate its competitive performance on convergence, efficiency, and scalability in comparison with the state-of-the-art literature.

Full Text