Combining Global and Local Similarity for Cross-Media Retrieval

Zhixin Li,Canlong Zhang,Feng Ling,Huifang Ma

doi:10.1109/access.2020.2969808

Zhixin Li, Canlong Zhang + Show 2 more

Open Access

https://doi.org/10.1109/access.2020.2969808

Copy DOI

Abstract

This paper mainly studies the problem of image-text matching in order to make image and text better match. Existing cross-media retrieval methods only make use of the information of image and part of text, that is, matching the whole image with the whole sentence, or matching some image areas with some words. In order to better reveal the potential connection between image and text semantics, this paper proposes a fusion of two levels of similarity across media images-text retrieval method, constructed the cross-media two-level network to explore the better matching between images and texts, it contains two subnets for dealing with global features and local characteristics. Specifically, in this method, the image is divided into the whole picture and some image area, the text is divided into the whole sentences and words, to study respectively, to explore the full potential alignment of images and text, and then use a two-level alignment framework is used to promote each other, fusion of two kinds of similarity can learn to complete representation of cross-media retrieval. Through the experimental evaluation on Flickr30K and MS-COCO datasets, the results show that the method in this paper can make the semantic matching of image and text more accurate, and is superior to the international popular cross-media retrieval method in various evaluation indexes.

Highlights

Cross-media represent that coexistence of complex media objects such as web text, images, audio, video, and so on. It is manifested in the complex relation and organization structure formed by various media objects
By ‘‘crossmedia’’, the same semantic information can be expressed from the respective sides, so specific content information can be more comprehensively reflected than a single media object and its specific modality
By integrating and analyzing multiple modes, we can understand the content information contained in cross-media retrieval

Summary

INTRODUCTION

Cross-media represent that coexistence of complex media objects such as web text, images, audio, video, and so on. The main idea of Lee et al [19] is to use the attention mechanism on the text and image side to learn better text and image representation, and use hard triplet loss to measure the similarity between text and image in the common subspace This approach uses the attention mechanism to obtain better potential semantic alignment. We were inspired by the ideas proposed in some rankingbased methods, in which loss function based on triplet loss can be used to increase the distance representation of different semantics of different media modes In this way, the result is a better modal matching representation of the image and text

CROSS-MEDIA TWO-LEVEL NETWORK MODEL

CROSS-MEDIA TWO-LEVEL ALIGNMENT

ANALYSIS OF EXPERIMENTAL RESULTS

EXPERIMENTAL PARAMETER SETTINGS

MAP VALUE ON THE MS-COCO DATASET

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 46	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Combining Global and Local Similarity for Cross-Media Retrieval

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Cross-media Image-Text Retrieval Based on Two-Level Network
Zhixin Li ... Fengqi Zhang
-
Zhixin Li, et. al.Zhixin Li ... Fengqi Zhang
01 Jan 2019
01 Jan 2019

Semi-Supervised Learning Based Semantic Cross-Media Retrieval
Xiyuan Zheng ... Meijia Zhang
IEEE Access | VOL. 9
Xiyuan Zheng, et. al.Xiyuan Zheng ... Meijia Zhang
01 Jan 2020
IEEE Access | VOL. 9

Cross-media retrieval of scientific and technological information based on multi-feature fusion
Yang Jiang ... Ang Li
Neurocomputing | VOL. 509
Yang Jiang, et. al.Yang Jiang ... Ang Li
08 Jul 2022
Neurocomputing | VOL. 509

Cross-Media Retrieval Based on Query Modality and Semi-Supervised Regularization
Yihe Liu ... Huaxiang Zhang
Journal of Advanced Computational Intelligence and Intelligent Informatics | VOL. 21
Yihe Liu, et. al.Yihe Liu ... Huaxiang Zhang
20 Nov 2017
Journal of Advanced Computational Intelligence and Intelligent Informatics | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Combining Global and Local Similarity for Cross-Media Retrieval

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access