End-to-end training image-text matching network

Depeng Wang,Yibo Sun,Chen Hong

doi:10.1109/dsc55868.2022.00027

End-to-end training image-text matching network

Depeng Wang, Yibo Sun + Show 1 more

https://doi.org/10.1109/dsc55868.2022.00027

Copy DOI

Publication Date: Jul 1, 2022

Affiliation: China Mobile (China)

#Flickr30K Dataset #Image-text Matching + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The task of image-text matching involves the information of both image and text, and the foremost challenge is how to find the correspondence between image and text. The existing work is based on the trained visual features to train the matching network or uses the complex Transformer network architecture to extract image features and text features to complete the matching. In this paper, we design a simple image-text matching network that can be trained end-to-end. We have conducted comparative experiments on Flickr30K and MSCOCO datasets, and the results show that our network performance is better than the latest methods on Flickr30K dataset, and we have further discussed our network performance in the VQAv2 dataset.

Full Text