Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples

Jin Yong Yoo,Yanjun Qi,Eli Lifland,John Morris

doi:10.18653/v1/2020.blackboxnlp-1.30

Abstract

We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks. We perform a fine-grained analysis of three elements relevant to search: search algorithm, search space, and search budget. When new search algorithms are proposed in past work, the attack search space is often modified alongside the search algorithm. Without ablation studies benchmarking the search algorithm change with the search space held constant, one cannot tell if an increase in attack success rate is a result of an improved search algorithm or a less restrictive search space. Additionally, many previous studies fail to properly consider the search algorithms’ run-time cost, which is essential for downstream tasks like adversarial training. Our experiments provide a reproducible benchmark of search algorithms across a variety of search spaces and query budgets to guide future research in adversarial NLP. Based on our experiments, we recommend greedy attacks with word importance ranking when under a time constraint or attacking long inputs, and either beam search or particle swarm optimization otherwise.

Highlights

Research has shown that current deep neural network models lack the ability to make correct predictions on adversarial examples (Szegedy et al, 2013)
Across three datasets and three search spaces, we found that beam search and particle swarm optimization are the best algorithms in terms of attack success rate
We can empirically confirm that beam and greedy search algorithms scale quadratically with input length, while word importance ranking scales linearly

Summary

Introduction

Research has shown that current deep neural network models lack the ability to make correct predictions on adversarial examples (Szegedy et al, 2013). Morris et al (2020b) formulated the process of generating natural language adversarial examples as a system of four components: a goal function, a set of constraints, a transformation, and a search algorithm. Such a system searches for a perturbation from x to x0 that fools a predictive NLP model by both achieving some goal (like fooling the model into predicting the wrong classification label) and fulfilling certain constraints. Search Algorithm: Recent methods proposed for generating adversarial examples in NLP frame their approach as a combinatorial search problem This is necessary because of the exponential nature of the search space. The graph of all potential adversarial examples for a given input is far too large for an exhaustive search

Objectives

Results

Discussion

Conclusion