Abstract

Adversarial Training has proved to be an effective training paradigm to enforce robustness against adversarial examples in modern neural network architectures. Despite many efforts, explanations of the foundational principles underpinning the effectiveness of Adversarial Training are limited and far from being widely accepted by the Deep Learning community. Moreover, very few research works investigated the limitations of robust Convolutional Neural Networks beyond the well-known accuracy drop on natural images. In this paper, we describe surprising properties of these models, shedding light on mechanisms through which robustness against adversarial attacks is implemented. We also highlight limitations and failure modes that were not discussed in prior works. Through extensive analyses on a wide range of architectures and datasets, we empirically demonstrate that adversarially-trained Convolutional Neural Networks do not exploit efficiently the model capacity and that the simplicity biases induced by Adversarial Training may lead to undesired behaviors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call