Neural network based (NN-based) classifiers are known vulnerable against adversarial examples, namely, adding slight perturbations to a benign image cause a classifier to make a false prediction. To evaluate the robustness of NN-based classifiers against adversarial examples, numerous adversarial attacks with high success rates have been proposed recently. NN-based image classifiers usually normalize valid images (e.g., RGB image where the value at each coordinate is an integer between 0 and 255) into a real continuous domain (e.g., 3-dimensional matrix where the value at each coordinate is a real number between 0 and 1) and make classification decisions on the normalized images. However, adversarial examples crafted in a real continuous domain may become benign once they are denormalized back into the corresponding discrete integer domain, known as the discretization problem. This problem has been mentioned in some prior works but received relatively limited attention. In this work, we report the first comprehensive study of existing works to understand the impacts of the discretization problem. By analyzing 35 representative methods and empirically studying 20 representative open source tools, we found 29/35 (theoretically) and 14/20 (empirically) are affected by the discretization problem, e.g., the success rate could dramatically drop from 100 to 10 percent after the domain transformation. As the first step towards addressing this problem in a black-box scenario, we propose a novel derivative-free optimization method, which can directly craft adversarial examples in the discrete integer domain. Experimental results show that the method achieves nearly 100 percent attack success rates for both targeted and untargeted attacks, comparable to the most popular white-box methods (FGSM, BIM and C&W), and significantly outperforms representative black-box methods (ZOO, AutoZOOM, NES-PGD, Bandits, FD, FD-PSO and GenAttack). Our results suggest that the discretization problem should be treated more seriously, and the discrete optimization algorithms show a promising future in crafting effective black-box attacks.
Read full abstract