Abstract

Deep neural networks (DNNs) have excellent performance in various applications, especially for image classification tasks. However, DNNs also face the threat of backdoor attacks. Backdoor attacks embed a hidden backdoor into a model, after which the infected model can achieve correct classification on benign images, while incorrectly classify the images with the backdoor triggers as the target label. To obtain a clean model from a backdoor dataset, we propose a Kalman filtering based multi-scale inactivation scheme, which can effectively remove poison data in a poison dataset and obtain a clean model. Every sample in the suspicious training dataset will be judged by multi-scale inactivation and obtain a series of judging results, then data fusion is conducted using kalman filtering to determine whether it is a poison sample. To further improve the performance, a trigger localization and target determination based scheme is proposed. Extensive experiments are conducted to demonstrate the superior effectiveness of the proposed method. The results show that the proposed methods can remove poison samples effectively, and achieve greater than 99% recall rate, and the attack success rate of the retrained clean model is smaller than 1%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.