We investigate the vulnerability of inputs in an adversarial setting and demonstrate that certain samples are more susceptible to adversarial perturbations compared to others. Specifically, we employ a simple yet effective approach to quantify the adversarial vulnerability of inputs, which relies on the clipped gradients of the loss with respect to the input. Our observations indicate that inputs with a low percentage of zero gradient components tend to be more vulnerable to attacks. These findings are supported by a theoretical explanation on a linear model and empirical evidence on deep neural networks. Across all datasets we tested, we find that inputs with the lowest zero gradient percentage, on average, exhibit 34.5% more susceptibility to adversarial attacks than randomly selected inputs. Additionally, we demonstrate that the zero gradient percentage, as a metric, transfers across different model architectures. Finally, we propose a novel black-box attack pipeline that enhances the efficiency of conventional query-based black-box attacks and show that input pre-filtering based on ZGP can boost the attack success rates, particularly under low perturbation levels. On average, across all datasets we test, our approach outperforms the conventional shadow model-based and query-based black-box attack pipelines by 44.9% and 30.4%, respectively.
Read full abstract