Abstract

Adversarial examples have been successfully generated for various image classification models. Recently, several methods have been proposed to generate adversarial examples for more sophisticated tasks such as image captioning and visual question answering (VQA). In this paper, we propose a targeted adversarial attack for VQA where the noise is added only to the background pixels of the image keeping the rest of the image unchanged. The experiments are done on two state-of-the-art VQA systems: End-to-End Neural Module Network (N2NMN) and Memory, Attention and Composition Network (MAC network) and three datasets: SHAPES, CLEVR, and VQA v2.0. We combine validation and test sets of SHAPES, and select 1000 image-question pairs from CLEVR validation set. For VQA v2.0, we select 500 image-question pairs from the validation set for experimentation. We study the proposed attack under two different settings: same-category and different-category; referring to whether or not the target adversarial answer lies in the same category as the original answer. For CLEVR, the proposed attack achieves 100% success rate for both the models under same-category setting and success rate of 22.3% for N2NMN and 73.9% for MAC network under different-category setting. For SHAPES, the proposed attack achieves success rate of 68.9% for N2NMN. The proposed attack also achieves high success rate for same-category setting in VQA v2.0. Furthermore, we give strong rationale behind the robustness of N2NMN to different-category attack.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.