An adversarial example is an input that a neural network misclassifies although the input differs only slightly from an input that the network classifies correctly. Adversarial examples are used to augment neural network training data, measure the vulnerability of neural networks, and provide intuitive interpretations of neural network output that humans can understand. Although adversarial examples are defined in the literature as similar to authentic input from the perspective of humans, the literature measures similarity with mathematical norms that are not scientifically correlated with human perception. Our main contributions are to construct a genetic algorithm (GA) that generates adversarial examples more similar to authentic input than do existing methods and to demonstrate with a survey that humans perceive those adversarial examples to have greater visual similarity than existing methods. The GA incorporates a neural network, and we test many parameter sets to determine which fitness function, selection operator, mutation operator, and neural network generate adversarial examples most visually similar to authentic input. We establish which mathematical norms are most correlated with human perception, which permits future research to incorporate the human perspective without testing many norms or conducting intensive surveys with human subjects. We also document a tradeoff between speed and quality in adversarial examples generated by GAs and existing methods. Although existing adversarial methods are faster, a GA provides higher-quality adversarial examples in terms of visual similarity and feasibility of adversarial examples. We apply the GA to the Modified National Institute of Standards and Technology (MNIST) and Canadian Institute for Advanced Research (CIFAR-10) datasets.
Read full abstract