Abstract

Fooling images are potential threats to deep neural networks (DNNs). These images cannot be recognized by humans as natural objects, e.g., dogs and cats. However, they are misclassified by DNNs as natural object classes with high confidence scores. Despite their original design concept, existing fooling images, if closely examined, can be seen to retain some features that are characteristic of the target objects. Hence, DNNs can react to these features. In this study, we evaluate whether fooling images with no characteristic pattern of natural objects, either locally or globally, can exist. As a minimal case, we introduce single-color images with a few pixels altered, called sparse fooling images (SFIs). We first prove that SFIs always exist under mild conditions for linear and nonlinear models and reveal that complex models are more likely to be vulnerable to SFI attacks. Using two SFI generation methods, we demonstrate that in deeper layers, SFIs have features similar to those of natural images. Therefore, they fool DNNs successfully. Among the other layers, we discover that the max-pooling layer causes vulnerability to SFIs. The defense against SFIs and transferability are also discussed. This study highlights a new vulnerability of DNNs by introducing a novel class of images that are distributed extremely far from natural images.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call