Abstract
Image-text matching aims to make a connection between visual and natural information. Some of the current methods have made great progress by using global alignment between images and sentences and local alignment between the image region and its corresponding word. However, the importance of the correlation between global alignment and local alignment is ignored to some extends. Therefore, in this paper, we propose a new region feature enhancement-based image text similarity inference network. Firstly, the image region feature is enhanced by graph convolutional neural network, which is used to find the correlation among different image region features with the production of features’ sematic relationship. Secondly, we propose a vector-based similarity representation to describe local and global alignment with a more comprehensive way. Finally, a graph convolution neural network is introduced to construct a similarity graph for propagating correlation between local alignment and global alignment to every part. By testing on the MSCOCO and Flickr30k dataset, our proposed method shows great accuracy performance and competitiveness.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.