Abstract

Deep neural networks (DNNs) have been shown to be vulnerable to backdoor attacks during training. Most of the existing backdoor defense methods are designed for specific types of backdoor attacks, and the work of detecting backdoors and mitigating backdoors is mostly separate. Currently, few general and complete defense frameworks have been developed. In this paper, we propose a lightweight, general, and complete defense framework against three main types of backdoor attacks. It can efficiently detect poisoned images and remove trigger patterns on poisoned images without costly retraining of the backdoor model. First, we use the feature difference between clean samples and poisoned samples in the middle layer of the model to distinguish them. Then, we remove the backdoor using image inpainting algorithm to remove the backdoor triggering pattern on the poisoned samples. We deploy three of the most popular backdoor attacks on three datasets to test the effectiveness of our defenses. Extensive experimental results show that our method can effectively defend against various backdoor attacks with a relatively small cost. In particular, we reduce the attack success rate of the more stealthy clean-label poisoning attack from 94.9% to 0.02% with little impact on the classification accuracy of the inpainted images.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call