Abstract

Recent studies have shown that Deep Neural Networks (DNNs) are vulnerable to backdoor attacks, where attackers embed hidden backdoors into the DNN models by poisoning a small number of training samples. The attacked models perform normally on benign samples, but when the backdoor is activated, their prediction results will be maliciously altered. To address the issues of suboptimal backdoor defense effectiveness and limited generality, a hybrid self-attention mechanism-based self-supervised learning method for backdoor defense is proposed. This method defends against backdoor attacks by leveraging the attack characteristics of backdoor threats, aiming to mitigate their impact. It adopts a decoupling approach, disconnecting the association between poisoned samples and target labels, and enhances the connection between feature labels and clean labels by optimizing the feature extractor. Experimental results on CIFAR-10 and CIFAR-100 datasets show that this method performs moderately in terms of Clean Accuracy (CA), ranking at the median level. However, it achieves significant effectiveness in reducing the Attack Success Rate (ASR), especially against BadNets and Blended attacks, where its defense capability is notably superior to other methods, with attack success rates below 2%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call