Boosting adversarial robustness via feature refinement, suppression, and alignment

Yulun Wu,Liang Bai,Dongmei Chen,Yuanhao Guo,Yanming Guo,Huaxin Xiao,Tianyuan Yu

doi:10.1007/s40747-023-01311-0

Abstract

AbstractDeep neural networks are vulnerable to adversarial attacks, bringing high risk to numerous security-critical applications. Existing adversarial defense algorithms primarily concentrate on optimizing adversarial training strategies to improve the robustness of neural networks, but ignore that the misguided decisions are essentially made by the activation values. Besides, such conventional strategies normally result in a great decline in clean accuracy. To address the above issues, we propose a novel RSA algorithm to counteract adversarial perturbations while maintaining clean accuracy. Specifically, RSA comprises three distinct modules: feature refinement, activation suppression, and alignment modules. First, the feature refinement module refines malicious activation values in the feature space. Subsequently, the feature activation suppression module mitigates redundant activation values induced by adversarial perturbations across both channel and spatial dimensions. Finally, to avoid an excessive performance drop on clean samples, RSA incorporates a consistency constraint and a knowledge distillation constraint for feature alignment. Extensive experiments on five public datasets and three backbone networks demonstrate that our proposed algorithm achieves consistently superior performance in both adversarial robustness and clean accuracy over the state-of-the-art.

Full Text