Online Asymmetric Metric Learning With Multi-Layer Similarity Aggregation for Cross-Modal Retrieval.

Yiling Wu,Shuhui Wang,Qingming Huang,Guoli Song

doi:10.1109/tip.2019.2908774

Abstract

Cross-modal retrieval has attracted intensive attention in recent years, where a substantial yet challenging problem is how to measure the similarity between heterogeneous data modalities. Despite using modality-specific representation learning techniques, most existing shallow or deep models treat different modalities equally and neglect the intrinsic modality heterogeneity and information imbalance among images and texts. In this paper, we propose an online similarity function learning framework to learn the metric that can well reflect the cross-modal semantic relation. Considering that multiple CNN feature layers naturally represent visual information from low-level visual patterns to high-level semantic abstraction, we propose a new asymmetric image-text similarity formulation which aggregates the layer-wise visual-textual similarities parameterized by different bilinear parameter matrices. To effectively learn the aggregated similarity function, we develop three different similarity combination strategies, i.e., average kernel, multiple kernel learning, and layer gating. The former two kernel-based strategies assign uniform weights on different layers to all data pairs; the latter works on the original feature representation and assigns instance-aware weights on different layers to different data pairs, and they are all learned by preserving the bi-directional relative similarity expressed by a large number of cross-modal training triplets. The experiments conducted on three public datasets well demonstrate the effectiveness of our methods.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Online Asymmetric Metric Learning With Multi-Layer Similarity Aggregation for Cross-Modal Retrieval.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Lead the way for us

Journal: IEEE transactions on image processing : a publication of the IEEE Signal Processing Society	Publication Date: Apr 2, 2019
Citations: 85

Similar Papers

Soft margin multiple kernel learning.
Xinxing Xu ... Dong Xu
IEEE Transactions on Neural Networks and Learning Systems | VOL. 24
Xinxing Xu, et. al. Xinxing Xu ... Dong Xu
01 May 2013
IEEE Transactions on Neural Networks and Learning Systems | VOL. 24

A procedure of adaptive kernel combination with kernel-target alignment for object classification
Motoaki Kawanabe ... Alexander Binder
-
Motoaki Kawanabe, et. al.Motoaki Kawanabe ... Alexander Binder
08 Jul 2009
08 Jul 2009

Multiple empirical kernel learning with locality preserving constraint
Qi Fan ... Zhe Wang
Knowledge-Based Systems | VOL. 105
Qi Fan, et. al.Qi Fan ... Zhe Wang
09 May 2016
Knowledge-Based Systems | VOL. 105

Face detection based on multiple kernel learning algorithm
Bo Sun ... Siming Cao
-
Bo Sun, et. al.Bo Sun ... Siming Cao
28 Sep 2016
28 Sep 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Online Asymmetric Metric Learning With Multi-Layer Similarity Aggregation for Cross-Modal Retrieval.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on image processing : a publication of the IEEE Signal Processing Society