Escaping Filter-based Adversarial Example Defense: A Reinforcement Learning Approach

Kaijian Dan,Yantao Li,Xinyu Lei,Gang Zhou,Shaojiang Deng,Huafeng Qin

doi:10.1109/mass56207.2022.00047

Abstract

An adversarial example is a specially-crafted example with subtle and intentional perturbations that causes a machine learning model to make a false classification. A plethora of papers have proposed to use filters to effectively defend against adversarial example attacks. However, we demonstrate that the filter-based defenses may not be reliable in this paper. We develop AEDescaptor, a scheme to escape the filter-based defenses. AEDescaptor uses a specially -crafted policy gradient reinforcement learning algorithm to generate adversarial exam-ples even if the filters are used to interrupt the backpropagation channel (that is used in traditional adversarial example attack algorithms). Furthermore, we design a customized algorithm to reduce the possible action space in policy gradient reinforcement learning to accelerate AEDescaptor training while still ensuring that AEDescaptor generates successful adversarial examples. The intensive experiments demonstrate that AEDescaptor-generated adversarial examples have good performance (in terms of success rate and transferability) to escape the filter-based defenses.

Full Text