Recent studies have shown that many machine learning models are vulnerable to adversarial attacks. Much remains unknown concerning the generalization error of deep neural networks (DNNs) for adversarial learning. In this paper, we study the generalization of DNNs for robust adversarial learning with $$\ell _\infty$$ attacks, particularly focusing on attacks produced under the fast gradient sign method. We establish a tight bound for the adversarial Rademacher complexity of DNNs based on both spectral norms and ranks of weight matrices. The spectral norm and rank constraints imply that this class of networks can be realized as a subset of the class of a shallow network composed with a low-dimensional Lipschitz continuous function. This crucial observation leads to a bound that improves the dependence on the network width compared to previous works and achieves depth independence. For general machine learning tasks, we show that adversarial Rademacher complexity is always larger than natural counterpart, but the effect of adversarial perturbations can be limited under our weight normalization framework. Our theoretical findings are also confirmed by experiments.
Read full abstract