Adversarial anchor-guided feature refinement for adversarial defense

Hakmin Lee,Yong Man Ro

doi:10.1016/j.imavis.2023.104722

Abstract

Adversarial training (AT), which is known as a robust training method for defending against adversarial examples, usually loses the performance of models for clean examples due to the feature distribution discrepancy between clean and adversarial. In this paper, we propose a novel Adversarial Anchor-guided Feature Refinement (AAFR) defense method aimed at reducing the discrepancy and delivering reliable performances for both clean and adversarial examples. We devise adversarial anchor that detects whether the feature comes from clean or adversarial example. Then, we use adversarial anchor to refine the feature to reduce the discrepancy. As a result, the proposed method substantially achieves adversarial robustness while preserving the performance for clean examples. The effectiveness of the proposed method is verified with comprehensive experiments on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets.

Full Text