Abstract

Machine learning models are vulnerable to adversarial examples. We study the most realistic hard-label black-box attacks in this paper. The main limitation of the existing attacks is that they need a large number of model queries, making them inefficient and even infeasible in practice. Inspired by the very successful fuzz testing approach in traditional software engineering and computer security domains, we propose fuzzing-based hard-label black-box attacks against machine learning models. We design an AdvFuzzer to explore multiple paths between a source image and a guidance image, and design a LocalFuzzer to explore the nearby space around a given input for identifying potential adversarial examples. We demonstrate that our fuzzing attacks are feasible and effective in generating successful adversarial examples with significantly reduced number of model queries and L0 distance. More interestingly, given a successful example generated by either our or other attacks, LocalFuzzer can immediately generate more successful adversarial examples even with smaller L2 distance from the source example.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call