Abstract
Modern cross-modal retrieving technology is required to find semantically relevant content from heterogeneous modalities. As previous studies construct unified dense correlation models on small scale cross-modal data, they are not capable of processing large scale Web data, because (a) the content of Web cross media is divergent; (b) the topic sensitive structure information in the high dimensional space is neglected; and (c) data should be organized as strictly corresponding pairs, which is not satisfied in real world scenarios. To address these challenges, we propose a cluster-sensitive cross-modal correlation learning framework. First, a set of cluster-sensitive correlation sub-models are learned instead of a unified correlation model, which better fits the content divergence in different modalities. We impose structured sparsity regularization on the projection vectors to learn a set of interpretable structured sparse correlation sub-models. Second, to compensate for the correspondence missing, we take full advantage of both intra-modal affinity and inter-modal co-occurrence. The projected coordinates of adjacent data within a modality tend to be similar, and the inconsistency of cluster-sensitive projection is minimized. The learned correlation model adapts to the content divergence and thus achieves better model generality and bias–variance trade-off. Extensive experiments on two large scale cross-modal data demonstrate the effectiveness of our approach.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have