Abstract

Adversarial Training has proved to be an effective training paradigm to enforce robustness against adversarial examples in modern neural network architectures. Despite many efforts, explanations of the foundational principles underpinning the effectiveness of Adversarial Training are limited and far from being widely accepted by the Deep Learning community. Moreover, very few research works investigated the limitations of robust Convolutional Neural Networks beyond the well-known accuracy drop on natural images. In this paper, we describe surprising properties of these models, shedding light on mechanisms through which robustness against adversarial attacks is implemented. We also highlight limitations and failure modes that were not discussed in prior works. Through extensive analyses on a wide range of architectures and datasets, we empirically demonstrate that adversarially-trained Convolutional Neural Networks do not exploit efficiently the model capacity and that the simplicity biases induced by Adversarial Training may lead to undesired behaviors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.