Abstract

Backdoor attack to deep neural networks (DNNs) is among the predominant approaches to bring great threats into artificial intelligence. The existing methods to detect backdoor attacks focus on the perspective of distributions in DNNs, however, limited by its ability of generalization across DNN models. In this article, a critical-path-based backdoor detector (CPBD) is proposed, which approaches to detect backdoor attacks via DNN's interpretability. CPBD is designed to efficiently discover the characteristics of backdoors, which distinguish the critical paths in the attacked DNNs. To deal with the intractably large number of neurons, we propose to simplify the neurons, and the preserved key nodes are integrated into a set of critical paths. Thus, a DNN model can be formulated as a combination of several critical paths. Afterward, the detection of backdoors is performed based on the analysis of critical paths corresponding to different classes. Then, combining all the above steps, the CPBD algorithm is integrated to present the results in a standard and systematic manner. In addition, CPBD is able to locate neurons associated with malicious triggers, the combination of which is named as trigger propagation path. Extensive experiments are conducted, which testify the efficiency of the proposed method on multiple DNNs and different trigger sizes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call