Although Convolutional Neural Networks (CNNs) are among the most important algorithms of computer vision and the artificial intelligence-based systems, they are vulnerable to adversarial attacks. Such attacks can cause dangerous consequences in real-life deployments. Consequently, testing of the artificial intelligence-based systems from their perspective is crucial to reliably support human prediction and decision-making through computation techniques under varying conditions. While proposing new effective attacks is important for neural network testing, it is also crucial to design effective strategies that can be used to choose target labels for these attacks. That is why, in this paper we propose a novel similarity-driven adversarial testing methodology for target label choosing. Our motivation is that CNNs, similarly to humans, tend to make mistakes mostly among categories they perceive similar. Thus, the effort to make models predict a particular class is not equal for all classes. Motivated by this, we propose to use the most and least similar labels to the ground truth according to different similarity measures to choose the target label for an adversarial attack. They can be treated as best- and worst-case scenarios in practical and transparent testing methodologies. As similarity is one of the key components of human cognition and categorization, the approach presents a shift towards a more human-centered security testing of deep neural networks. The obtained numerical results show the superiority of the proposed methods to the existing strategies in the targeted and the non-targeted testing setups.
Read full abstract