Zero-shot Cross-modal Retrieval by Assembling AutoEncoder and Generative Adversarial Network

Xing Xu,Huimin Lu,Jialin Tian,Heng Tao Shen,Jie Shao,Kaiyi Lin

doi:10.1145/3424341

Abstract

Conventional cross-modal retrieval models mainly assume the same scope of the classes for both the training set and the testing set. This assumption limits their extensibility on zero-shot cross-modal retrieval (ZS-CMR), where the testing set consists of unseen classes that are disjoint with seen classes in the training set. The ZS-CMR task is more challenging due to the heterogeneous distributions of different modalities and the semantic inconsistency between seen and unseen classes. A few of recently proposed approaches are inspired by zero-shot learning to estimate the distribution underlying multimodal data by generative models and make the knowledge transfer from seen classes to unseen classes by leveraging class embeddings. However, directly borrowing the idea from zero-shot learning (ZSL) is not fully adaptive to the retrieval task, since the core of the retrieval task is learning the common space. To address the above issues, we propose a novel approach named Assembling AutoEncoder and Generative Adversarial Network (AAEGAN), which combines the strength of AutoEncoder (AE) and Generative Adversarial Network (GAN), to jointly incorporate common latent space learning, knowledge transfer, and feature synthesis for ZS-CMR. Besides, instead of utilizing class embeddings as common space, the AAEGAN approach maps all multimodal data into a learned latent space with the distribution alignment via three coupled AEs. We empirically show the remarkable improvement for ZS-CMR task and establish the state-of-the-art or competitive performance on four image-text retrieval datasets.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Zero-shot Cross-modal Retrieval by Assembling AutoEncoder and Generative Adversarial Network

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Journal: ACM Transactions on Multimedia Computing, Communications, and Applications	Publication Date: Jan 31, 2021
Citations: 23

Similar Papers

Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval
Kaiyi Lin ... Lianli Gao
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Kaiyi Lin, et. al.Kaiyi Lin ... Lianli Gao
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Zero-VAE-GAN: Generating Unseen Features for Generalized and Transductive Zero-Shot Learning.
Rui Gao ... Jiaxin Chen
IEEE Transactions on Image Processing | VOL. 29
Rui Gao, et. al.Rui Gao ... Jiaxin Chen
01 Jan 2020
IEEE Transactions on Image Processing | VOL. 29

Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval
Shengsheng Qian ... Changsheng Xu
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Shengsheng Qian, et. al.Shengsheng Qian ... Changsheng Xu
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

Defect identification method for ultrasonic inspection of pipeline welds based on cross-modal zero-shot learning
Yu Zeyu ... Du Guofeng
Measurement Science and Technology | VOL. 35
Yu Zeyu, et. al.Yu Zeyu ... Du Guofeng
02 Nov 2023
Measurement Science and Technology | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Zero-shot Cross-modal Retrieval by Assembling AutoEncoder and Generative Adversarial Network

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications