Abstract

Deep neural networks (DNNs) are vulnerable to adversarial attacks. A great effort has been directed to developing effective defenses against adversarial attacks and finding vulnerabilities of proposed defenses. A recently proposed defense called Trapdoor-enabled Detection (TeD) deliberately injects trapdoors into DNN models to trap and detect adversarial examples targeting categories protected by TeD. TeD can effectively detect existing state-of-the-art adversarial attacks. In this paper, we propose a novel black-box adversarial attack on TeD, called Feature-Indistinguishable Attack (FIA). It circumvents TeD by crafting adversarial examples indistinguishable in the feature (i.e., neuron-activation) space from benign examples in the target category. To achieve this goal, FIA jointly minimizes the distance to the expectation of feature representations of benign samples in the target category and maximizes the distances to positive adversarial examples generated to query TeD in the preparation phase. A constraint is used to ensure that the feature vector of a generated adversarial example is within the distribution of feature vectors of benign examples in the target category. Our extensive empirical evaluation with different configurations and variants of TeD indicates that our proposed FIA can effectively circumvent TeD. FIA opens a door for developing much more powerful adversarial attacks. The FIA code is available at: https://github.com/CGCL-codes/FeatureIndistinguishableAttack.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call