Abstract

Deep Neural Networks (DNNs) have been widely used in variety of fields with great success. However, recent research indicates that DNNs are susceptible to adversarial attacks, which can easily fool the well-trained DNN-based classifiers without being detected by human eyes. In this article, we propose to integrate the target DNN model with our robust bit-plane classifiers to defend against adversarial attacks. The bit-plane classifiers take bit-planes of input images for convolution, which is motivated by our observation that successful attacks aim to generate imperceptible perturbations, and they mainly affect the low-order bits of pixels in clean images when adding the perturbations. We also propose two metrics, bit-plane perturbation rate and channel modification rate, to further explain the robustness of bit-plane classifiers. We discuss potential adaptive attack and find that our defense can be effective as long as the adversarial examples are qualified. We conduct experiments on dataset CIFAR-10 and GTSRB under white-box attack and black-box attack. The results show that our defense method can effectively increase the average model accuracy from 16.23% to 83.53% under white-box attack and from 40.65% to 88.14% under black-box attack on CIFAR-10 without sacrificing the accuracy of clean images.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call