Complete Defense Framework to Protect Deep Neural Networks against Adversarial Examples

Guangling Sun,Yuying Su,Xiaofeng Lu,Andrzej Ceglowski,Chuan Qin,Wenbo Xu

doi:10.1155/2020/8319249

Abstract

Although Deep Neural Networks (DNNs) have achieved great success on various applications, investigations have increasingly shown DNNs to be highly vulnerable when adversarial examples are used as input. Here, we present a comprehensive defense framework to protect DNNs against adversarial examples. First, we present statistical and minor alteration detectors to filter out adversarial examples contaminated by noticeable and unnoticeable perturbations, respectively. Then, we ensemble the detectors, a deep Residual Generative Network (ResGN), and an adversarially trained targeted network, to construct a complete defense framework. In this framework, the ResGN is our previously proposed network which is used to remove adversarial perturbations, and the adversarially trained targeted network is a network that is learned through adversarial training. Specifically, once the detectors determine an input example to be adversarial, it is cleaned by ResGN and then classified by the adversarially trained targeted network; otherwise, it is directly classified by this network. We empirically evaluate the proposed complete defense on ImageNet dataset. The results confirm the robustness against current representative attacking methods including fast gradient sign method, randomized fast gradient sign method, basic iterative method, universal adversarial perturbations, DeepFool method, and Carlini & Wagner method.

Highlights

The performance of deep neural networks (DNNs) on various applications, such as computer vision [1], natural language processing [2], and speech recognition [3], has been impressive
Positive Rate (TPR) and False Positive Rate (FPR). e results indicate that TPR of the adversarial examples generated by Fast Gradient Sign Method (FGSM), Randomized Fast Gradient Sign Method (R-FGSM), Basic Iterative Method (BIM), and UAP is close to 100% and FPR of legitimate examples is below 1%
For the detectors that the positive set is composed of adversarial examples crafted by DeepFool, CW_UT, or CW_T, both TPR and FPR are not satisfied. e results imply that Subtractive Pixel Adjacency Matrix (SPAM)-based feature is not appropriate to characterize the unnoticeable perturbations

Summary

Introduction

The performance of deep neural networks (DNNs) on various applications, such as computer vision [1], natural language processing [2], and speech recognition [3], has been impressive. Ese adversarial examples deceived the targeted network even though they did not affect human recognition (see Figure 1). These adversarial examples are serious potential threats to security concerned applications such as autonomous vehicle systems [8] and face recognition [9]. With respect to the first type, one of the most effective strategies is to (re) train the targeted network with adversarial examples to obtain an adversarially trained targeted network Each of these approaches has its respective limitations. The adversary knows the structure and parameters of the targeted network, the training data, and even the defensive scheme of the defender. We review widely used white box attacks

Objectives

Results

Conclusion