Revisiting single-step adversarial training for robustness and generalization

Zhuorong Li,Daiwei Yu,Minghui Wu,Sixian Chan,Hongchuan Yu,Zhike Han

doi:10.1016/j.patcog.2024.110356

Abstract

Recently, single-step adversarial training has received high attention because it shows robustness and efficiency. However, a phenomenon referred to as “catastrophic overfitting” has been observed, which is prevalent in single-step defenses and may frustrate attempts to use FGSM adversarial training. To address this issue, we propose a novel method, Stable and Efficient Adversarial Training (SEAT). SEAT mitigates catastrophic overfitting by harnessing on local properties that differentiate a robust model from one prone to catastrophic overfitting. The proposed SEAT is underpinned by robust theoretical justifications, in that minimizing the SEAT loss is demonstrated to promote a smoother empirical risk, consequently enhancing robustness. Experimental results demonstrate that the proposed method successfully mitigates catastrophic overfitting, yielding superior performance amongst efficient defenses. Our single-step method can reach 51% robust accuracy for CIFAR-10 with l∞ perturbations of radius 8/255 under a strong PGD-50 attack, matching the performance of a 10-step iterative method at merely 3% computational cost.

Full Text