The widespread deployment of deep neural networks (DNNs) in critical real-time applications has spurred significant research into their security and robustness. A key vulnerability identified is that DNN decisions can be maliciously altered by introducing carefully crafted noise into the input data, leading to erroneous predictions. This is known as an adversarial attack. In this paper, we propose a novel detection framework leveraging segmentation masks and image segmentation techniques to identify adversarial attacks on DNNs, particularly in the context of autonomous driving systems. Our defense technique considers two levels of adversarial detection. The first level mainly detects adversarial inputs with large perturbations using the U-net model and one-class support vector machine (SVM). The second level of defense proposes a dynamic segmentation algorithm based on the k-means algorithm and a verifier model that controls the final prediction of the input image. To evaluate our approach, we comprehensively compare our method to the state-of-the-art feature squeeze method under a white-box attack, using eleven distinct adversarial attacks across three benchmark and heterogeneous data sets. The experimental results demonstrate the efficacy of our framework, achieving overall detection rates exceeding 96% across all adversarial techniques and data sets studied. It is worth mentioning that our method enhances the detection rates of FGSM and BIM attacks, reaching average detection rates of 95.65% as opposed to 62.63% in feature squeezing across the three data sets.
Read full abstract