Boosting Targeted Black-Box Attacks via Ensemble Substitute Training and Linear Augmentation

Xianfeng Gao,Quanxin Zhang,Xiaohui Kuang,Yu-An Tan,Hongwei Jiang

doi:10.3390/app9112286

Abstract

These years, Deep Neural Networks (DNNs) have shown unprecedented performance in many areas. However, some recent studies revealed their vulnerability to small perturbations added on source inputs. Furthermore, we call the ways to generate these perturbations’ adversarial attacks, which contain two types, black-box and white-box attacks, according to the adversaries’ access to target models. In order to overcome the problem of black-box attackers’ unreachabilities to the internals of target DNN, many researchers put forward a series of strategies. Previous works include a method of training a local substitute model for the target black-box model via Jacobian-based augmentation and then use the substitute model to craft adversarial examples using white-box methods. In this work, we improve the dataset augmentation to make the substitute models better fit the decision boundary of the target model. Unlike the previous work that just performed the non-targeted attack, we make it first to generate targeted adversarial examples via training substitute models. Moreover, to boost the targeted attacks, we apply the idea of ensemble attacks to the substitute training. Experiments on MNIST and GTSRB, two common datasets for image classification, demonstrate our effectiveness and efficiency of boosting a targeted black-box attack, and we finally attack the MNIST and GTSRB classifiers with the success rates of 97.7% and 92.8%.

Highlights

Deep Neural Networks (DNNs) have been widely used in many areas today, such as self-driving [1], speech recognition [2], image recognition and so on
We generate the augmentation based on dataset Sρ and the substitute model F to get Sρ+1, which we describe as
During the studying of the white-box attacking method Carlini and Wagner Attack (C&W) [11], we found that the ensemble attack [19,20,21] is highly efficient and effective for increasing the transferability of adversarial examples

Summary

Introduction

Deep Neural Networks (DNNs) have been widely used in many areas today, such as self-driving [1], speech recognition [2], image recognition and so on. Some people [6] found that, for an image recognition system, when the adversary adds tiny perturbations purposefully on a clean image, the classifier may misclassify the synthetic image, which looks the same as the source image, into any other desired class [7] The white-box attackers craft the adversarial examples based on the internal information of DNN, which might include the training dataset, outputs, hyper-parameters, gradients and feature maps. This makes this kind of attack easier than black-box attacks.

Objectives

Results

Conclusion