Provable, Structured, and Efficient Methods for Robustness of Deep Networks to Adversarial Examples

Eric Wong

doi:10.1184/r1/13607570.v1

Abstract

While deep networks have contributed to major leaps in raw performance across various applications, they are also known to be quite brittle to targeted data perturbations.By adding a small amount of adversarial noise to the data, it is possible to drastically change the output of a deep network. The existence of these so-called adversarial examples, perturbed data points which fool the model, pose a serious risk for safety- and security-centric applications where reliability and robustness are critical. In this dissertation, we present and analyze a number of approaches for mitigating the effect of adversarial examples, also known as adversarial defenses. These defenses can offer varying degrees and types of robustness, and in this dissertation we study defenses which differ in the strength of the the robustness guarantee, the efficiency and simplicity of the defense, and the type of perturbation being defendedagainst. We start with the strongest type of guarantee called provable adversarial defenses, showing that is possible to compute duality-based certificates that guarantee no adversarial examples exist within an `p-bounded region, which are trainable and can be minimized to learn networks which are provably robust to adversarial attacks. The approach is agnostic to the specific architecture and is applicable to arbitrary computational graphs, scaling to medium sized convolutional networks with random projections. We then switch gears to developing a deeper understanding of a more empirical defense known as adversarial training. Although adversarial training does not come with formal guarantees, it can learn networks more efficiently and with better empirical performance against attacks. We study the optimization process and revealseveral intriguing properties of the robust learning problem, finding that a simple modification to one of the earliest adversarial attacks can be sufficient to learn networksrobust to much stronger attacks, as well as finding that adversarial training as a general procedure is highly susceptible to overfitting. These discoveries have significantimplications on both the efficiency of adversarial training as well as the state of the field: for example, virtually all recent algorithmic improvements in adversarial training can be matched by simply using early stopping. The final component of this dissertation expands the realm of adversarial examples beyond `p-norm bounded perturbations, to enable more realistic threat modelsfor applications beyond imperceptible noise. We define a threat model called the Wasserstein adversarial example, which captures semantically meaningful imagetransformations like translations and rotations previously uncaptured by existing threat models. We present an efficient algorithm for projecting onto Wassersteinballs, enabling both generation of and adversarial training against Wasserstein adversarial examples. Finally, we demonstrate how to generalize adversarial trainingto defend against multiple types of threats simultaneously, improving upon naive aggregations of adversarial attacks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Provable, Structured, and Efficient Methods for Robustness of Deep Networks to Adversarial Examples

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification
Sanglee Park ... Jungmin So
Applied Sciences | VOL. 10
Sanglee Park, et. al.Sanglee Park ... Jungmin So
14 Nov 2020
Applied Sciences | VOL. 10

MAT: A Multi-strength Adversarial Training Method to Mitigate Adversarial Attacks
Chang Song ... Sicheng Li
-
Chang Song, et. al.Chang Song ... Sicheng Li
01 Jul 2018
01 Jul 2018

Towards evaluating the robustness of deep diagnostic models by adversarial attack.
Mengting Xu ... Zhongnian Li
Medical Image Analysis | VOL. 69
Mengting Xu, et. al.Mengting Xu ... Zhongnian Li
22 Jan 2021
Medical Image Analysis | VOL. 69

LADDER: Latent boundary-guided adversarial training
Xiaowei Zhou ... Jie Yin
Machine Learning | VOL. 112
Xiaowei Zhou, et. al.Xiaowei Zhou ... Jie Yin
09 Sep 2022
Machine Learning | VOL. 112

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Provable, Structured, and Efficient Methods for Robustness of Deep Networks to Adversarial Examples

Abstract

Talk to us

Similar Papers