Abstract

In the increasingly popular era of adversarial machine learning (AML), developing more robust and generalized algorithms has become a key research topic. Image-text matching as the foundation of tasks such as video Q&A and text-image generation also faces various attacks in AML. Current image-text matching based on the similarity of matching fragments only focuses on the local matching results, which does not establish a comprehensive cognition of content in text and image, resulting in mismatching of the abstract scene when facing complex attacks. Meanwhile, existing methods are not sensitive enough to identify the internal relationship between objects in different local areas, which also confuse matching. Therefore, aiming at the above problems, a global similarity matching module is proposed, which is dynamically fused with local similarity to measure the matching results flexibly and improve the understanding of abstract scenes. Furthermore, a global-local cognition fusion training mechanism based on relationship adversarial sample generation is proposedto enhance understanding of internal relationships between objects in different local area through adversarial sample generation. Global loss is introduced to train the overall model, and adjust the proportion of global-local loss in the training process to better identified the relationships between objects in different local areas, and avoided confusion and matching caused by the similarity of matching objects. Experimental results show that the proposed method is 7.4 % (rSum) better than the SOTA method on the Flickr30K dataset, and 4.0 % (rSum using the 1K test set) better on the MS-COCO dataset. The proposed global-local fusion (GLF) based on adversarial sample generation for image-text matching algorithm improves the accuracy and robustness of image-text matching performs well in facing some security challenges, promoting the development of visual and linguistic modal fusion.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call