Cross-Modal Retrieval Using Multiordered Discriminative Structured Subspace Learning

Liang Zhang,Qingming Huang,Qi Tian,Bingpeng Ma,Guorong Li

doi:10.1109/tmm.2016.2646219

Abstract

This paper proposes a novel method for cross-modal retrieval. In addition to the traditional vector (text)-to-vector (image) framework, we adopt a matrix (text)-to-matrix (image) framework to faithfully characterize the structures of different feature spaces. Moreover, we propose a novel metric learning framework to learn a discriminative structured subspace, in which the underlying data distribution is preserved for ensuring a desirable metric. Concretely, there are three steps for the proposed method. First, the multiorder statistics are used to represent images and texts for enriching the feature information. We jointly use the covariance (second-order), mean (first-order), and bags of visual (textual) features (zeroth-order) to characterize each image and text. Second, considering that the heterogeneous covariance matrices lie on the different Riemannian manifolds and the other features on the different Euclidean spaces, respectively, we propose a unified metric learning framework integrating multiple distance metrics, one for each order statistical feature. This framework preserves the underlying data distribution and exploits complementary information for better matching heterogeneous data. Finally, the similarity between the different modalities can be measured by transforming the multiorder statistical features to the common subspace. The performance of the proposed method over the previous methods has been demonstrated through the experiments on two public datasets.

Full Text