The security and privacy of a system are urgent issues in achieving secure and efficient learning-based systems. Recent studies have shown that these systems are susceptible to subtle adversarial perturbations applied to inputs. Although these perturbations are difficult for humans to detect, they can easily mislead deep learning classifiers. Noise injection, as a defense mechanism, can offer a provable defense against adversarial attacks by reducing sensitivity to subtle input changes. However, these methods face issues of computational complexity and limited adaptability. We propose a multilayer filter defense model, drawing inspiration from filter-based image denoising techniques. This model inserts a filtering layer after the input layer and before the convolutional layer, and incorporates noise injection techniques during the training process. This model substantially enhances the resilience of image classification systems to adversarial attacks. We also investigated the impact of various filter combinations, filter area sizes, standard deviations, and filter layers on the effectiveness of defense. The experimental results indicate that, across the MNIST, CIFAR10, and CIFAR100 datasets, the multilayer filter defense model achieves the highest average accuracy when employing a double-layer Gaussian filter (filter area size of 3×3, standard deviation of 1). We compared our method with two filter-based defense models, and the experimental results demonstrated that our method attained an average accuracy of 71.9%, effectively enhancing the robustness of the image recognition classifier against adversarial attacks. This method not only performs well on small-scale datasets but also exhibits robustness on large-scale datasets (miniImageNet) and modern models (EfficientNet and WideResNet).
Read full abstract