Secure machine learning against adversarial samples at test time

Jing Lin,Laurent L. Njilla,Kaiqi Xiong

doi:10.1186/s13635-021-00125-2

Abstract

Deep neural networks (DNNs) are widely used to handle many difficult tasks, such as image classification and malware detection, and achieve outstanding performance. However, recent studies on adversarial examples, which have maliciously undetectable perturbations added to their original samples that are indistinguishable by human eyes but mislead the machine learning approaches, show that machine learning models are vulnerable to security attacks. Though various adversarial retraining techniques have been developed in the past few years, none of them is scalable. In this paper, we propose a new iterative adversarial retraining approach to robustify the model and to reduce the effectiveness of adversarial inputs on DNN models. The proposed method retrains the model with both Gaussian noise augmentation and adversarial generation techniques for better generalization. Furthermore, the ensemble model is utilized during the testing phase in order to increase the robust test accuracy. The results from our extensive experiments demonstrate that the proposed approach increases the robustness of the DNN model against various adversarial attacks, specifically, fast gradient sign attack, Carlini and Wagner (C&W) attack, Projected Gradient Descent (PGD) attack, and DeepFool attack. To be precise, the robust classifier obtained by our proposed approach can maintain a performance accuracy of 99% on average on the standard test set. Moreover, we empirically evaluate the runtime of two of the most effective adversarial attacks, i.e., C&W attack and BIM attack, to find that the C&W attack can utilize GPU for faster adversarial example generation than the BIM attack can. For this reason, we further develop a parallel implementation of the proposed approach. This parallel implementation makes the proposed approach scalable for large datasets and complex models.

Highlights

Deep learning has been widely deployed in image classification [1,2,3], natural language processing [4,5,6], malware detection [7,8,9], self-driving cars [10, 11], robots [12], etc
Adversarial examples are obtained by adding a small undetectable perturbation to original samples in order to mislead a Deep neural network (DNN) model to make a wrong classification
6.4 Results We consider the accuracies of the classifiers on the normal MNIST and CIFAR-10 test images and the accuracies of the classifier under Fast Gradient Sign Method (FGSM), Carlini and Wagner (C&W), Basic Iterative Method (BIM), and DeepFool attacks, respectively

Summary

Introduction

Deep learning has been widely deployed in image classification [1,2,3], natural language processing [4,5,6], malware detection [7,8,9], self-driving cars [10, 11], robots [12], etc. The state-of-the-art performance of image classification on the ImageNet dataset increases from 73.8% (in 2011) to 98.7% (Top 5 Accuracy in 2020) utilizing deep learning models. In image classification, an image is inputted into the neural network and the convolutional layers of the neural network will extract the important features from the image directly. This makes deep learning desirable for many complex tasks such as natural language processing and image classification, where software engineers have difficulty writing rules for a computer to learn such tasks

Objectives

Methods

Results

Discussion

Conclusion