A Distributed Black-Box Adversarial Attack Based on Multi-Group Particle Swarm Optimization.

Naufal Suryanto,Youngyeo Yun,Yongsu Kim,Howon Kim,Harashta Tatimma Larasati,Hyoeun Kang

doi:10.3390/s20247158

Naufal Suryanto, Youngyeo Yun + Show 4 more

Open Access

https://doi.org/10.3390/s20247158

Copy DOI

Journal: Sensors (Basel, Switzerland)	Publication Date: Dec 14, 2020
Citations: 4	License type: CC BY 4.0

Affiliation: Pusan National University

Abstract

Adversarial attack techniques in deep learning have been studied extensively due to its stealthiness to human eyes and potentially dangerous consequences when applied to real-life applications. However, current attack methods in black-box settings mainly employ a large number of queries for crafting their adversarial examples, hence making them very likely to be detected and responded by the target system (e.g., artificial intelligence (AI) service provider) due to its high traffic volume. A recent proposal able to address the large query problem utilizes a gradient-free approach based on Particle Swarm Optimization (PSO) algorithm. Unfortunately, this original approach tends to have a low attack success rate, possibly due to the model’s difficulty of escaping local optima. This obstacle can be overcome by employing a multi-group approach for PSO algorithm, by which the PSO particles can be redistributed, preventing them from being trapped in local optima. In this paper, we present a black-box adversarial attack which can significantly increase the success rate of PSO-based attack while maintaining a low number of query by launching the attack in a distributed manner. Attacks are executed from multiple nodes, disseminating queries among the nodes, hence reducing the possibility of being recognized by the target system while also increasing scalability. Furthermore, we utilize Multi-Group PSO with Random Redistribution (MGRR-PSO) for perturbation generation, performing better than the original approach against local optima, thus achieving a higher success rate. Additionally, we propose to efficiently remove excessive perturbation (i.e, perturbation pruning) by utilizing again the MGRR-PSO rather than a standard iterative method as used in the original approach. We perform five different experiments: comparing our attack’s performance with existing algorithms, testing in high-dimensional space in ImageNet dataset, examining our hyperparameters (i.e., particle size, number of clients, search boundary), and testing on real digital attack to Google Cloud Vision. Our attack proves to obtain a 100% success rate on MNIST and CIFAR-10 datasets and able to successfully fool Google Cloud Vision as a proof of the real digital attack by maintaining a lower query and wide applicability.

Highlights

Over the years, the use of deep learning has been found in the broad range of applications to perform various tasks, ranging from image classification, object recognition, and social network analysis [1], to name a few
The primary metric for black-box attacks is the attack success rate, which is the rate of misclassification when the generated adversarial example is inputted to the target model
A low query attack that achieves a high attack success rate means that the attack can be carried out more effectively and fast, becoming harder to be detected by the artificial intelligence (AI) provider

Summary

Introduction

The use of deep learning has been found in the broad range of applications to perform various tasks, ranging from image classification, object recognition, and social network analysis [1], to name a few. One of the most intriguing vulnerabilities that have been found to date is the adversarial example, in which seemingly similar images in human eyes are perceived differently by classifiers in deep learning models This susceptibility is the base of adversarial attacks, in which a small perturbation (i.e., carefully crafted "noise") imperceptible to human vision are added to an input image, causing performance degradation of the model, and even result in image misclassification [2]. Earlier methods for generating a perturbed image for adversarial attacks mainly run under the white-box settings, which is less applicable in the real-world applications. In this approach, adversaries have the information of the target model’s structure, parameter, training dataset, or even the learned weight. Their method is time-consuming and impractical due to the expensive use of linear search method to find the optimal value [2]

Objectives

Results

Discussion

Conclusion