Abstract

The state-of-the-art deep learning methods can be vulnerable: imperceptibly elaborated perturbations will induce unexpected behaviors. In this paper, we introduce two novel adversarial example detection methods utilizing pixel value diversity. First, we propose two independent metrics to assess the pixel value diversity separately, which reflects the spread of the pixel values in an image. Then we observe that adversarial examples are different from clean images on both metrics, regardless of attack methods. Based on this observation, for either metric, we can set a threshold and compare the threshold with the value of an image on the metric to detect whether the image is an adversarial example. Against several popular attack methods, experimental results on a variety of datasets show that our approach achieves better performances in detecting adversarial examples, compared to the state-of-the-art detection method. We also show that our methods are reliable even against adaptive attack.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call