On Noise Stability and Robustness of Adversarially Trained Networks on NVM Crossbars

Chun Tao,Indranil Chakraborty,Kaushik Roy,Deboleena Roy

doi:10.1109/tvlsi.2022.3193312

Abstract

Applications based on deep neural networks (DNNs) have grown exponentially in the past decade. To match their increasing computational needs, several nonvolatile memory (NVM) crossbar-based accelerators have been proposed. Recently, researchers have shown that apart from improved energy efficiency and performance, such approximate hardware also possess intrinsic robustness for defense against adversarial attacks. Prior works have focused on quantifying this intrinsic robustness for vanilla networks, that is DNNs trained on unperturbed inputs. However, adversarial training of DNNs, i.e., training with adversarially perturbed images, is the benchmark technique for robustness, and sole reliance on intrinsic robustness of the hardware may not be sufficient. In this work, we explore the design of robust DNNs through the amalgamation of adversarial training and the intrinsic robustness offered by NVM crossbar-based analog hardware. First, we study the noise stability of such networks on unperturbed inputs and observe that internal activations of adversarially trained networks have lower signal-to-noise ratio (SNR), and are sensitive to noise compared to vanilla networks. As a result, they suffer significantly higher performance degradation due to the approximate computations on analog hardware; on an average <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2\times $ </tex-math></inline-formula> accuracy drop. Noise stability analyses clearly show the instability of adversarially trained DNNs. On the other hand, for adversarial images generated using Square Black Box attacks, ResNet-10/20 adversarially trained on CIFAR-10/100 display a robustness improvement of 20%–30% under high <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon _{\mathrm{ attack}}$ </tex-math></inline-formula> (degree of input perturbation). For adversarial images generated using projected-gradient-descent (PGD) White-Box attacks, the adversarially trained DNNs present a 5%–10% gain in robust accuracy due to the underlying NVM crossbar when <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon _{\mathrm{ attack}}$ </tex-math></inline-formula> is greater than the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">epsilon</i> of the adversarial training ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon _{\mathrm{ train}}$ </tex-math></inline-formula> ). Our results indicate that implementing adversarially trained networks on analog hardware requires careful calibration between hardware nonidealities and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon _{\mathrm{ train}}$ </tex-math></inline-formula> to achieve optimum robustness and performance.

Full Text