Abstract

Deep neural networks (DNNs) have recently been found to be vulnerable to adversarial examples, which raises concerns about their reliability and poses potential security threats. Adversarial training has been extensively studied to counter adversarial attacks. However, the limited attack types incorporated during the training phase will restrict the defense performance of models against unknown attacks and impact their standard accuracies. Furthermore, we discover that adversarial training models tend to overfit redundant noisy features, which hinders their generalization. To alleviate these issues, this paper proposes the attention information bottleneck-guided knowledge distillation (AIB-KD) method to enhance models' adversarial robustness. We integrate adversarial training with attention information bottleneck as the defense framework to achieve an optimal trade-off between information compression and classification performance. Simultaneously, we specifically employ knowledge distillation to guide the adversarial training models in learning both the standard attention information and valuable deep feature distributions to enhance their defense generalization capability. Experimental results demonstrate that AIB-KD can effectively classify adversarial examples in multiple attack settings. The average white-box and black-box classification accuracies for the WideResNet-28-10 model on the CIFAR-10 dataset are 56.59% and 85.49%, respectively, and the average accuracies on the SVHN dataset are 61.71% and 88.96%. When applied to unknown attack scenarios, AIB-KD is more effective and interpretable than state-of-the-art methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.