Cross-Modal Retrieval Based on Full-Modal Autoencoder with Generative Adversarial Mechanism

Peng Zhao,Taiyu Ma,Huiting Liu,Yi Li

doi:10.3724/sp.j.1089.2021.18757

Abstract

<p indent="0mm">Existing cross-modal retrieval methods based on generative adversarial networks can’t fully explore the inter-modality invariance. Aiming to solve the problem, a novel cross-modal retrieval method based on full-modal autoencoder with generative adversarial mechanism is proposed. Two parallel full- modal autoencoders are introduced to embed samples of different modalities into a common space. Each full-modal autoencoder not only reconstructs the feature representation of its own modality, but also reconstructs the feature representation of the other modality. A classifier is designed to predict the categories of the embedding features in the common space, which aims to preserve the semantic discriminative information of samples. Three discriminators are designed to determine the modal categories of the input features, respectively, and these three discriminators work cooperatively to fully explore the inter-modality invariance. The mean average precision (mAP) is used to evaluate the accuracy of cross-modal retrieval and extensive experiments are conducted on three public datasets which are Pascal Sentence, Wikipedia and NUS-WIDE-10k. Compared to ten state-of-the-art cross-modal retrieval methods including traditional methods and deep learning methods, the mAP of the proposed method on the three datasets improves at least 4.8%, 1.4% and 1.1% on the three datasets respectively. The experimental results prove the effectiveness of the proposed method.

Full Text