Abstract

The core of cross-modal retrieval is to measure the similarity between different modalities of data. The mainstream is to construct a common subspace using representation learning, in which different types of data can be directly compared. However, most of these methods utilize solely a portion of the information from the dataset, with varying degrees of information loss in the objective function. In this paper, we present a novel cross-modal learning framework called Information Aggregation Semantic Adversarial Network, which minimizes information loss through adversarial learning and the double constraints of two subspaces. Among them, the proposed cross-modal information aggregation constraint based on common subspace aggregates the global information and fine-grained information simultaneously to generate a common representation of cross-modal similarity and fight a discriminator used to distinguish the original modality of the common representation. Furthermore, a semantic constraint is considered to improve the semantic discrimination of common representation based on the potential association between labels and representations of a semantic subspace. Through the joint exploitation of the above, the information loss in the cross-modal process is greatly reduced. Extensive experimental results on three widely-used benchmark datasets demonstrate that the proposed method is effective in cross-modal learning and significantly outperforms the state-of-the-art cross-modal retrieval methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call