Boosting Cross-Modal Retrieval With MVSE++ and Reciprocal Neighbors

Wei Wei,Chunna Tian,Xiangnan Zhang,Heng Liu,Mengmeng Jiang

doi:10.1109/access.2020.2992187

Wei Wei, Chunna Tian + Show 3 more

Open Access

https://doi.org/10.1109/access.2020.2992187

Copy DOI

Abstract

In this paper, we propose to boost the cross-modal retrieval through mutually aligning images and captions on the aspects of both features and relationships. First, we propose a multi-feature based visual-semantic embedding (MVSE++) space to retrieve the candidates in another modality, which provides a more comprehensive representation of the visual content of objects and scene context in images. Thus, we have more potential to find a more accurate and detailed caption for the image. However, captioning concentrates the image contents by semantic description. The cross-modal neighboring relationships start from the visual and semantic sides are asymmetric. To retrieve a better cross-modal neighbor, we propose to re-rank the initially retrieved candidates according to the ${k}$ nearest reciprocal neighbors in MVSE++ space. The method is evaluated on the benchmark datasets of MSCOCO and Flickr30K with standard metrics. We achieve highe accuracy in caption retrieval and image retrieval at both R@1 and R@10.

Highlights

The task of image-caption retrieval aims at finding corresponding sentences given an image query or retrieving images with a sentence query
To retrieve a better cross-modal neighbor, we propose to re-rank the initially retrieved candidates according to the k nearest reciprocal neighbors in MVSE++ space
We show that the multifeature, which avoids the visual-semantic miss-alignment on both objects and scene context aspects, is more representative than the previously used single feature for image-caption retrieval

Summary

INTRODUCTION

The task of image-caption retrieval aims at finding corresponding sentences given an image query or retrieving images with a sentence query. VSE [16] embeds the deep visual features and deep semantic features into a cross-modal space based on a bi-direction ranking loss. We propose to retrieve the candidates in another modality in a multi-feature based VSE++ (MVSE++) space. It provides a more comprehensive representation of the visual content of objects and scene context in images. We propose multiple visual features based embedding method, which provides a more comprehensive representation of the visual content of objects and scene context in images. It provides us a better initial retrieval. To the best of our knowledge, we achieve the highest accuracy in caption retrieval and image retrieval at both R@1 and R@10

RELATED WORK

THE CROSS-MODAL K-RECIPROCAL NEAREST NEIGHBOR BASED RE-RANKING

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Boosting Cross-Modal Retrieval With MVSE++ and Reciprocal Neighbors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Improved Image Retrieval Technology Based on Singular Value Decomposition
Qingyun Zhang ... Huichao Jiang
-
Qingyun Zhang, et. al.Qingyun Zhang ... Huichao Jiang
21 Jan 2022
21 Jan 2022

Image Retrieval Based on Learning to Rank and Multiple Loss
Lili Fan ... Huangshui Hu
ISPRS International Journal of Geo-Information | VOL. 8
Lili Fan, et. al.Lili Fan ... Huangshui Hu
04 Sep 2019
ISPRS International Journal of Geo-Information | VOL. 8

Hashing-Based Scalable Remote Sensing Image Search and Retrieval in Large Archives
Begum Demir ... Lorenzo Bruzzone
IEEE Transactions on Geoscience and Remote Sensing | VOL. 54
Begum Demir, et. al.Begum Demir ... Lorenzo Bruzzone
01 Feb 2016
IEEE Transactions on Geoscience and Remote Sensing | VOL. 54

CBIR Systems: Techniques and Challenges
Niveditha Arunkumar ... A Ranjith Ram
-
Niveditha Arunkumar, et. al.Niveditha Arunkumar ... A Ranjith Ram
01 Jul 2020
01 Jul 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Boosting Cross-Modal Retrieval With MVSE++ and Reciprocal Neighbors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access