Random Transformation of image brightness for adversarial attack

Bo Yang,Kaiyong Xu,Hengwei Zhang,Hengjun Wang

doi:10.3233/jifs-211157

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial examples, which are crafted by adding small, human-imperceptible perturbations to the original images, but make the model output inaccurate predictions. Before DNNs are deployed, adversarial attacks can thus be an important method to evaluate and select robust models in safety-critical applications. However, under the challenging black-box setting, the attack success rate, i.e., the transferability of adversarial examples, still needs to be improved. Based on image augmentation methods, this paper found that random transformation of image brightness can eliminate overfitting in the generation of adversarial examples and improve their transferability. In light of this phenomenon, this paper proposes an adversarial example generation method, which can be integrated with Fast Gradient Sign Method (FGSM)-related methods to build a more robust gradient-based attack and to generate adversarial examples with better transferability. Extensive experiments on the ImageNet dataset have demonstrated the effectiveness of the aforementioned method. Whether on normally or adversarially trained networks, our method has a higher success rate for black-box attacks than other attack methods based on data augmentation. It is hoped that this method can help evaluate and improve the robustness of models.

Highlights

In image recognition, some experiments on standard test sets have proven that deep neural networks (DNNs) have higher recognition ability than that of humans [1,2,3,4]
The results show that our method mostly performs better on both normally and adversarially trained networks, and RT-MI-Fast Gradient Sign Method (FGSM) has higher black-box attack success rates than diverse input method (DIM)
We propose a new attack method based on data augmentation that randomly transforms the brightness of the input image at each iteration in the attack process to alleviate overfitting and generate adversarial examples with more transferability

Summary

Introduction

Some experiments on standard test sets have proven that deep neural networks (DNNs) have higher recognition ability than that of humans [1,2,3,4]. DNNs have been shown to be highly vulnerable to attacks from adversarial examples [5, 6], because adding perturbations to an original input image that are imperceptible to humans will cause misclassification of the models. Adversarial examples normally have a certain degree of transferability, meaning those generated for one model may be adversarial to another, which enables black-box attacks [8]. These phenomena show that the existence of transferable adversarial examples poses a great threat to the security of AI systems, leading to the chaos of AI driven intelligent systems, the formation of missed judgments and misjudgments, and even the collapse of the system. It is significant and urgent to study the reason for and essence of the existence of adversarial

Objectives

Methods

Findings

Conclusion