Recent studies have shown that machine learning algorithms are susceptible to imperceptible perturbations. These studies focus on the laboratory environment, where the attacker has knowledge about internal information of the victim model or feedbacks such as class probabilities. There is still a gap between theory and physical world, the risk of adversarial attacks under the more extreme and realistic condition needs to be figured out. Here we propose a knowledge restricted black-box attack model where the attacker can only get the final predict label. In the meantime, we model the attacker as a resource-restricted one, such as query-limited. The limitations of knowledge level and resources make previous work unable to be directly applied. For this problem, the current state-of-the-art method is boundary attack, however it requires large a number of queries. In this paper, we make several contributions to investigate the vulnerability of machine learning models in more realistic scenarios. First, we reconstruct the optimization problem, measure the quality of the sample points by L2 distance. Second, we provide a more effective algorithm, using cutting plane method and local optimization. Third, we propose two effective dynamic defense strategy, which is easy to implement. At last, we conduct an experimental evaluation on MNIST, Fashion-MNIST and malware detection datasets. The results show that (1) compared with state-of-the-art method, our cutting plane method reduces the number of queries while ensuring the attack efficiency; (2) Dynamic defense strategy is effective against label-only adversarial attacks, the rate of attack success dropped from nearly 100% to 23%, with a considerable classification accuracy; (3) Improved defense strategy guarantees the effectiveness of defense and improves the stability of the whole model.
Read full abstract