Abstract

Cross-media retrieval is to find the relationship between different modal samples, and to use some modal samples to search for other modal samples of approximate semantics. The existing cross-media retrieval method only utilizes the information of the image and part of the text, that is, the whole image and the whole sentence are matched, or some image areas and some words are matched. In order to make better use of the integrated features of image and text, this paper proposes a cross-media image-text retrieval method that integrates two-level similarity to explore better matching between image and text semantics. Specifically, in this method, the image is divided into the whole picture and some image areas, the text is divided into the whole sentences and some words, to study respectively, to explore the full potential alignment of images and text, and then a two-level alignment framework is used to promote each other, fusion of two kinds of similarity can learn to complete representation of cross-media retrieval. Experimental results on the Flickr30K and MS-COCO datasets show that this model has a better recall rate than many of the current internationally advanced cross-media retrieval models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.