Label guided correlation hashing for large-scale cross-modal retrieval

Guohua Dong,Xiang Zhang,Shiwei Wang,Zhigang Luo,Long Lan

doi:10.1007/s11042-019-7192-5

Abstract

With the explosive growth of multimedia data such as text and image, large-scale cross-modal retrieval has attracted more attention from vision community. But it still confronts the problems of the so-called “media gap” and search efficiency. Looking into the literature, we find that one leading type of existing cross-modal retrieval methods has been broadly investigated to alleviate the above problems by capturing the correlations across modalities as well as learning hashing codes. However, supervised label information is usually independently considered in the process of either generating hashing codes or learning hashing function. To this, we propose a label guided correlation cross-modal hashing method (LGCH), which investigates an alternative way to exploit label information for effective cross-modal retrieval from two aspects: 1) LGCH learns the discriminative common latent representation across modalities through joint generalized canonical correlation analysis (GCCA) and a linear classifier; 2) to simultaneously generate binary codes and hashing function, LGCH introduces an adaptive parameter to effectively fuse the common latent representation and the label guided representation for effective cross-modal retrieval. Moreover, each subproblem of LGCH has the elegant analytical solution. Experiments of cross-modal retrieval on three multi-media datasets show LGCH performs favorably against many well-established baselines.

Full Text