Abstract
Adversarial examples in an image classification task cause neural networks to predict incorrect class labels with high confidence. Many applications related to image classification, such as self‐driving and facial recognition, have been seriously threatened by adversarial attacks. One class of the existing defense methods is the preprocessing‐based defense which transforms the inputs before feeding them to the system. These methods are independent of the classification models and have excellent defensive effects under oblivious attacks. An image filtering method is often used to evaluate the robustness of adversarial examples. However, filtering induces the loss of valuable features that reduce the classification accuracy and weakens the adversarial perturbation. Furthermore, the fixed filtering parameters cannot effectively defend against the adversarial attack. This paper proposes a novel defense method based on different filter parameters and randomly rotated filtered images. The output classification probabilities are statistically averaged, which keeps the classification accuracy while removing the perturbation. Experimental results show that the proposed method improves the defense capability of various models against diverse kinds of oblivious adversarial attacks. Under the adaptive attack, the transferability of the adversarial examples among different models is significantly reduced.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.