Abstract

AbstractAdversarial robustness is critical for deep learning models to defend against adversarial attacks. Although adversarial training is considered to be one of the most effective ways to improve the model’s adversarial robustness, it usually yields models with lower natural accuracy. In this paper, we argue that, for the attackable examples, traditional adversarial training which utilizes a fixed size perturbation ball can create adversarial examples that deviate far away from the original class towards the target class. Thus, the model’s performance on the natural target class will drop drastically, which leads to the decline of natural accuracy. To this end, we propose the Data-Adaptive Adversarial Training (DAAT) which adaptively adjusts the perturbation ball to a proper size for each of the natural examples with the help of a natural trained calibration network. Besides, a dynamic training strategy empowers the DAAT models with impressive robustness while retaining remarkable natural accuracy. Based on a toy example, we theoretically prove the recession of the natural accuracy caused by adversarial training and show how the data-adaptive perturbation size helps the model resist it. Finally, empirical experiments on benchmark datasets demonstrate the significant improvement of DAAT models on natural accuracy compared with strong baselines.KeywordsAdversarial trainingAdversarial attackAdversarial robustness

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call