Hard-label black-box textual adversarial attacks present a highly challenging task due to the discrete and non-differentiable nature of text data and the lack of direct access to the model’s predictions. Research in this issue is still in its early stages, and the performance and efficiency of existing methods has potential for improvement. For instance, exchange-based and gradient-based attacks may become trapped in local optima and require excessive queries, hindering the generation of adversarial examples with high semantic similarity and low perturbation under limited query conditions. To address these issues, we propose a novel framework called HyGloadAttack (adversarial Attacks via Hybrid optimization and Global random initialization) for crafting high-quality adversarial examples. HyGloadAttack utilizes a perturbation matrix in the word embedding space to find nearby adversarial examples after global initialization and selects synonyms that maximize similarity while maintaining adversarial properties. Furthermore, we introduce a gradient-based quick search method to accelerate the search process of optimization. Extensive experiments on five datasets of text classification and natural language inference, as well as two real APIs, demonstrate the significant superiority of our proposed HyGloadAttack method over state-of-the-art baseline methods.
Read full abstract