Learning hierarchical embedding space for image-text matching

Hao Sun,Xiaolin Qin,Xiaojing Liu

doi:10.3233/ida-230214

Abstract

There are two mainstream strategies for image-text matching at present. The one, termed as joint embedding learning, aims to model the semantic information of both image and sentence in a shared feature subspace, which facilitates the measurement of semantic similarity but only focuses on global alignment relationship. To explore the local semantic relationship more fully, the other one, termed as metric learning, aims to learn a complex similarity function to directly output score of each image-text pair. However, it significantly suffers from more computation burden at retrieval stage. In this paper, we propose a hierarchically joint embedding model to incorporate the local semantic relationship into a joint embedding learning framework. The proposed method learns the shared local and global embedding spaces simultaneously, and models the joint local embedding space with respect to specific local similarity labels which are easy to access from the lexical information of corpus. Unlike the methods based on metric learning, we can prepare the fixed representations of both images and sentences by concatenating the normalized local and global representations, which makes it feasible to perform the efficient retrieval. And experiments show that the proposed model can achieve competitive performance when compared to the existing joint embedding learning models on two publicly available datasets Flickr30k and MS-COCO.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning hierarchical embedding space for image-text matching

Abstract

Talk to us

Similar Papers

More From: Intelligent Data Analysis

Lead the way for us

Similar Papers

Efficient Deep Feature Calibration for Cross-Modal Joint Embedding Learning
Zhongwei Xie ... Luo Zhong
-
Zhongwei Xie, et. al.Zhongwei Xie ... Luo Zhong
18 Oct 2021
18 Oct 2021

Learning Text-image Joint Embedding for Efficient Cross-modal Retrieval with Deep Feature Engineering
Zhongwei Xie ... Ling Liu
ACM Transactions on Information Systems | VOL. 40
Zhongwei Xie, et. al.Zhongwei Xie ... Ling Liu
01 Dec 2021
ACM Transactions on Information Systems | VOL. 40

Unsupervised Discriminative Embedding For Sub-Action Learning in Complex Activities
Sirnam Swetha ... Mubarak Shah
-
Sirnam Swetha, et. al.Sirnam Swetha ... Mubarak Shah
19 Sep 2021
19 Sep 2021

Feature selection via joint embedding learning and sparse regression
...
-
, et. al. ...
16 Jul 2011
16 Jul 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning hierarchical embedding space for image-text matching

Abstract

Talk to us

Similar Papers

More From: Intelligent Data Analysis