Abstract
Adversarial training is an effective way to defend deep neural networks (DNN) against adversarial examples. However, there are atypical samples that are rare and hard to learn, or even hurt DNNs' generalization performance on test data. In this paper, we propose a novel algorithm to reweight the training samples based on self-supervised techniques to mitigate the negative effects of the atypical samples. Specifically, a memory bank is built to record the popular samples as prototypes and calculate the memorization weight for each sample, evaluating the "typicalness" of a sample. All the training samples are reweigthed based on the proposed memorization weights to reduce the negative effects of atypical samples. Experimental results show the proposed method is flexible to boost state-of-the-art adversarial training methods, improving both robustness and standard accuracy of DNNs.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have