There exists a heterogeneity gap between multi-modal data, hence it is difficult to directly measure the similarity between them. A common way to solve the problem is representation learning. Due to the unique adversarial optimization principle and the efficient cross-modal correlation learning ability, the cross-modal retrieval based on Generative Adversarial Network (GAN) has received significant attention recently. However, the potential semantic information is not fully explored in most GAN-based cross-modal learning approaches. In this paper, we propose a novel Adaptive Adversarial Learning (AAL) based cross-modal retrieval method. The generator of a specific modality projects heterogeneous data into the potential common subspace, while the discriminator is against the generator to maintain discriminability. In addition, three task-specific loss functions are designed in the generators to comprehensively exploit the semantic and label information. One problem is that directly optimizing the generator network will lead to ignoring the assessment of contribution to multi-loss functions. To overcome the above challenge, we present an adaptive balance strategy to match the appropriate contribution for each loss according to the degree of dispersion. Comprehensive experimental results on three widely-used databases show that the proposed method is effective and superior to the existing cross-modal retrieval methods.
Read full abstract