Attacking VQA Systems via Adversarial Background Noise

Akshay Chaturvedi,Utpal Garain

doi:10.1109/tetci.2020.2977695

Abstract

Adversarial examples have been successfully generated for various image classification models. Recently, several methods have been proposed to generate adversarial examples for more sophisticated tasks such as image captioning and visual question answering (VQA). In this paper, we propose a targeted adversarial attack for VQA where the noise is added only to the background pixels of the image keeping the rest of the image unchanged. The experiments are done on two state-of-the-art VQA systems: End-to-End Neural Module Network (N2NMN) and Memory, Attention and Composition Network (MAC network) and three datasets: SHAPES, CLEVR, and VQA v2.0. We combine validation and test sets of SHAPES, and select 1000 image-question pairs from CLEVR validation set. For VQA v2.0, we select 500 image-question pairs from the validation set for experimentation. We study the proposed attack under two different settings: same-category and different-category; referring to whether or not the target adversarial answer lies in the same category as the original answer. For CLEVR, the proposed attack achieves 100% success rate for both the models under same-category setting and success rate of 22.3% for N2NMN and 73.9% for MAC network under different-category setting. For SHAPES, the proposed attack achieves success rate of 68.9% for N2NMN. The proposed attack also achieves high success rate for same-category setting in VQA v2.0. Furthermore, we give strong rationale behind the robustness of N2NMN to different-category attack.

Full Text