Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval

Kaiyi Lin,Lianli Gao,Xing Xu,Heng Tao Shen,Zheng Wang

doi:10.1609/aaai.v34i07.6817

Abstract

Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data of new classes across different modality data. It is challenging for not only the heterogeneous distributions across different modalities, but also the inconsistent semantics across seen and unseen classes. A handful of recently proposed methods typically borrow the idea from zero-shot learning, i.e., exploiting word embeddings of class labels (i.e., class-embeddings) as common semantic space, and using generative adversarial network (GAN) to capture the underlying multimodal data structures, as well as strengthen relations between input data and semantic space to generalize across seen and unseen classes. In this paper, we propose a novel method termed Learning Cross-Aligned Latent Embeddings (LCALE) as an alternative to these GAN based methods for ZS-CMR. Unlike using the class-embeddings as the semantic space, our method seeks for a shared low-dimensional latent space of input multimodal features and class-embeddings by modality-specific variational autoencoders. Notably, we align the distributions learned from multimodal input features and from class-embeddings to construct latent embeddings that contain the essential cross-modal correlation associated with unseen classes. Effective cross-reconstruction and cross-alignment criterions are further developed to preserve class-discriminative information in latent space, which benefits the efficiency for retrieval and enable the knowledge transfer to unseen classes. We evaluate our model using four benchmark datasets on image-text retrieval tasks and one large-scale dataset on image-sketch retrieval tasks. The experimental results show that our method establishes the new state-of-the-art performance for both tasks on all datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Apr 3, 2020
Citations: 30

Similar Papers

Defect identification method for ultrasonic inspection of pipeline welds based on cross-modal zero-shot learning
Yu Zeyu ... Du Guofeng
Measurement Science and Technology | VOL. 35
Yu Zeyu, et. al.Yu Zeyu ... Du Guofeng
02 Nov 2023
Measurement Science and Technology | VOL. 35

Zero-shot Cross-modal Retrieval by Assembling AutoEncoder and Generative Adversarial Network
Xing Xu ... Huimin Lu
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 17
Xing Xu, et. al.Xing Xu ... Huimin Lu
31 Jan 2021
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 17

A Probabilistic Zero-Shot Learning Method via Latent Nonnegative Prototype Synthesis of Unseen Classes.
Haofeng Zhang ... Ling Shao
IEEE Transactions on Neural Networks and Learning Systems | VOL. 31
Haofeng Zhang, et. al.Haofeng Zhang ... Ling Shao
01 Jan 2019
IEEE Transactions on Neural Networks and Learning Systems | VOL. 31

Transductive Visual-Semantic Embedding for Zero-shot Learning
Xing Xu ... Jie Shao
-
Xing Xu, et. al.Xing Xu ... Jie Shao
06 Jun 2017
06 Jun 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence