On Improving the Effectiveness of Adversarial Training

Yi Qin,Chuan Yue,Ryan Hunt

doi:10.1145/3309182.3309190

Abstract

Machine learning models, including neural networks, are vulnerable to adversarial examples, which are adversarial inputs generated from legitimate examples by applying small perturbations to fool machine learning models to misclassify. Algorithms that are used to generate adversarial examples are called adversarial example generation methods. As the state-of-the-art defense approach, adversarial training improves the robustness of machine learning models by augmenting the training data with adversarial examples. However, adversarial training is far from being perfect yet, and a deeper understanding of it is always needed for further improving its effectiveness. In this paper, we propose to investigate two research questions. The first question is: whether Method-Based Ensemble Adversarial Training (MBEAT) could be beneficial, i.e., whether leveraging the adversarial examples generated by multiple methods could help increase the effectiveness of adversarial training. The second question is: whether Round Gap Of Adversarial Training (RGOAT) could exist, i.e., whether a neural network model adversarially trained in one round would not be robust against the adversarial examples further generated from this model itself. We design an adversarial training experimental framework to answer these two research questions. We find that MBEAT is indeed beneficial, indicating that it has some important value in practice. We also find that RGOAT indeed exists, indicating that adversarial training should be an iterative process.

Full Text