Black-box attacks on image classification model with advantage actor-critic algorithm in latent space

Xu Kang,Bin Song,Jie Guo,Hao Qin,Xiaojiang Du,Mohsen Guizani

doi:10.1016/j.ins.2023.01.019

Abstract

The Internet of Things (IoT) ecosystem that integrates a wide variety of intelligent multimedia applications and services has undergone a tremendous transformation over the years. As an essential approach for securing IoT-based multimedia services, Artificial Intelligence (AI) has been in innovation at a rapid pace. However, many machine learning systems are vulnerable to adversarial examples, including advanced deep neural networks. By making imperceptible modifications to real examples, the prediction will deviate far from the correct value. A deep reinforcement learning-based black-box attacker on the image classification model is introduced in our research. Different from the existing black-box attacks, which require massive queries and trials in the pixel space, the proposed method compresses the images into latent space through variational inference, querying the optimal examples efficiently with actor-critic networks. Rather than patch-to-patch translation with generative adversarial networks in related works, the fake examples are generated by gradually superimposing the perturbations into the latent space at each step through the Markov decision process (MDP) to high stability and good astringency of the attacker. Experiments evaluated on the ImageNet dataset demonstrate that the proposed attacker can generate adversarial images for most samples in limited steps, greatly reducing the accuracy of the model.

Full Text