Low-epsilon adversarial attack against a neural network online image stream classifier

Hossein Mohasel Arjomandi,Mohammad Khalooei,Maryam Amirmazlaghani

doi:10.1016/j.asoc.2023.110760

Abstract

An adversary intercepts a stream of images between a sender and a receiver neural network classifier. To minimize its footprint, the adversary only attacks a limited number of images within the stream. The adversary is interested in maximizing the number of successfully conducted attacks among all performed attacks. Upon the arrival of each image and before the arrival of the following image, the adversary must irrevocably decide whether it wants to attack the current image or not. The target model is a fixed deep neural network that may use any form of regularization. The adversary has query access to the target model, which can feed images and obtain the loss, which may contain regularization and classification loss terms. Since this paper’s proposed method needs classification loss term alone, it also suggests a novel method in which the adversary estimates the regularization loss term and eliminates it. All images are partitioned into three groups based on their after-attack classification loss and treated according to their group. Moreover, this paper provides some promising test results on various datasets.

Full Text