Abstract

Adversarial training (AT) is an effective approach to making deep neural networks robust against adversarial attacks. Recently, different AT defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well-studied adversarial attacks, such as projected gradient descent (PGD). High adversarial robustness can also arise if an attack fails to find adversarial gradient directions, a phenomenon known as "gradient masking." In this work, we analyze the effect of label smoothing on AT as one of the potential causes of gradient masking. We then develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed guided projected gradient attack (G-PGA). Our attack approach is based on a "match and deceive" loss that finds optimal adversarial directions through guidance from a surrogate model. Our modified attack does not require random restarts a large number of attack iterations or a search for optimal step size. Furthermore, our proposed G-PGA is generic, thus it can be combined with an ensemble attack strategy as we demonstrate in the case of auto-attack, leading to efficiency and convergence speed improvements. More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.