Abstract

Image-text matching aims to make a connection between visual and natural information. Some of the current methods have made great progress by using global alignment between images and sentences and local alignment between the image region and its corresponding word. However, the importance of the correlation between global alignment and local alignment is ignored to some extends. Therefore, in this paper, we propose a new region feature enhancement-based image text similarity inference network. Firstly, the image region feature is enhanced by graph convolutional neural network, which is used to find the correlation among different image region features with the production of features’ sematic relationship. Secondly, we propose a vector-based similarity representation to describe local and global alignment with a more comprehensive way. Finally, a graph convolution neural network is introduced to construct a similarity graph for propagating correlation between local alignment and global alignment to every part. By testing on the MSCOCO and Flickr30k dataset, our proposed method shows great accuracy performance and competitiveness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call