Multi-class Detection for Off The Shelf transfer-based Black Box Attacks

Niklas Bunzel,Dominic Böringer

doi:10.1145/3591197.3591305

Abstract

Nowadays, deep neural networks are used for a variety of tasks in a wide range of application areas. Despite achieving state-of-the-art results in computer vision and image classification tasks, neural networks are vulnerable to adversarial attacks. Various attacks have been presented in which small perturbations of an input image are sufficient to change the predictions of a model. Furthermore, the changes in the input image are imperceptible to the human eye. In this paper, we propose a multi-class detector framework based on image statistics. We implemented a detection scheme for each attack and evaluated our detectors against Attack on Attention (AoA) and FGSM achieving a detection rate of 70% and 75% respectivley, with a FPR of . The multi-class detector identifies 77% of attacks as adversarial, while remaining 90% of the benign images, demonstrating that we can detect out-of-the-box attacks.

Full Text