Lyapunov stability for detecting adversarial image examples

Anibal Pedraza,Oscar Deniz,Gloria Bueno

doi:10.1016/j.chaos.2021.111745

Abstract

Adversarial examples are a challenging threat to machine learning models in terms of trustworthiness and security. Using small perturbations to manipulate input data, it is possible to drive the decision of a deep learning model into failure, which can be catastrophic in applications like autonomous driving, security-surveillance or other critical systems that increasingly rely on machine learning technologies. On the one hand, a body of research proposes attack techniques to generate adversarial examples from more and more models and datasets. On the other hand, efforts are also being made to defend against adversarial examples. One family of defense methods aims at detecting whether the input sample is adversarial or legit. This works proposes an adversarial example detection method based on the application of chaos theory to evaluate the perturbations that the input introduces in the deep network. The assumption is that the adversarial inputs trigger a chaotic behavior in the network. For this purpose, the Lyapunov exponents are used to evaluate chaoticity in network activations. This allows to detect adversarial perturbations. Adversarial attacks like Carlini and Wagner, Elastic Net or Projected Gradient Descent are used in the experiments, reaching a detection rate that reaches 60% for the most difficult scenarios and up to 100% for most of the combinations of attack, dataset and network tested.

Full Text