MARS: Learning Modality-Agnostic Representation for Scalable Cross-Media Retrieval

Yunbo Wang,Yuxin Peng

doi:10.1109/tcsvt.2021.3136330

Abstract

Cross-media retrieval (CMR) offers a flexible retrieval experience across multiple modalities. Existing CMR approaches are constrained by the assumption that the paired modalities are available in training, and they leverage the data of all modalities to obtain a common representation. However, as dealing with the data from new modality, the previous all modalities need to be re-trained, compromising the flexibility and practicality of CMR. In this paper, we propose an approach termed learning Modality-Agnostic Representation for Scalable cross-media retrieval (MARS), which allows each modality to be trained independently. To be specific, MARS treats the label information as a distinct modality, and introduces a label parsing module LabNet to generate semantic representation for correlating different modalities. Meanwhile, MARS constructs the modality-specific representation module DataNet to obtain the modality-shared representation and modality-exclusive representation equipped with unbiased semantic classification. Technically, for the first modality, we jointly train the LabNet and its DataNet to preserve the semantic similarity between the Label-derived representation and the modality-shared representation. For new modalities, MARS employs the well-learned LabNet to extract the representation in labels, and then such representation is served as the privilege to guide the associated DataNet training via the same objective. Furthermore, we assign the same classifier to the representation module of all modalities for better semantic alignment. With the above schema, the obtained modality-shared representation is considered to be modality-agnostic. Extensive experiments on several benchmark multi-modality datasets demonstrate that the proposed MARS achieves better results than existing methods.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MARS: Learning Modality-Agnostic Representation for Scalable Cross-Media Retrieval

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society

Lead the way for us

Journal: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society	Publication Date: Jul 1, 2022
Citations: 15

Similar Papers

Learning a Semantic Space by Deep Network for Cross-media Retrieval
Zhao Li ... Weiwei Xing
-
Zhao Li, et. al.Zhao Li ... Weiwei Xing
01 Sep 2015
01 Sep 2015

Deep Learning Generic Features for Cross-Media Retrieval
Xindi Shang ... Hanwang Zhang
-
Xindi Shang, et. al.Xindi Shang ... Hanwang Zhang
01 Jan 2015
01 Jan 2015

Cross-Media Retrieval by Multimodal Representation Fusion with Deep Networks
Jinwei Qi ... Yuxin Peng
-
Jinwei Qi, et. al.Jinwei Qi ... Yuxin Peng
01 Jan 2017
01 Jan 2017

A New Benchmark and Approach for Fine-grained Cross-media Retrieval
Xiangteng He ... Liu Xie
-
Xiangteng He, et. al.Xiangteng He ... Liu Xie
15 Oct 2019
15 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MARS: Learning Modality-Agnostic Representation for Scalable Cross-Media Retrieval

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society