Abstract

AbstractAdversarial Training (AT) is one of the most effective defense methods against adversarial examples, in which a model is trained on both clean and adversarial examples. Although AT improves the robustness by smoothing the small neighborhood, it reduces accuracy on clean examples. We propose Weighted Adaptive Perturbation Adversarial Training (WAPAT) to reduce the loss of clean accuracy and improve robustness, which is motivated by the adaptive learning rate of the model optimizer. In the adversarial examples generation stage of adversarial training, We introduce weights based on feature changes to adaptively adjust the perturbation step size for different features. In iterative attacks, if a feature is frequently attacked, we increase the attack strength of this area, otherwise, we weaken the attack strength of this area. WAPAT is a data augmentation method that shortens the distance of adversarial examples to the classification boundary. The generated adversarial examples maintain good adversarial effects while retaining more clean examples information. Therefore, such adversarial examples can help us to obtain a more robust model while reducing the loss of recognition accuracy for clean examples. To demonstrate our method, we implement WAPAT in three adversarial training frameworks. Experimental results on CIFAR-10 and MNIST show that WAPAT significantly improves adversarial robustness with less sacrifice of accuracy.KeywordsAdversarial examplesAdversarial trainingWeighted perturbations

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call