Effects of and Defenses Against Adversarial Attacks on a Traffic Light Classification CNN

Morris Wan,Meng Han,Lin Li,Zhigang Li,Selena He

doi:10.1145/3374135.3385288

Abstract

Convolutional Neural networks (CNNs) have become more and more prevalent in applications such as image recognition and autonomous driving. However, recent literature shows that a lot of these deep-learning algorithms are susceptible to small perturbations to their input known as adversarial attacks. In this project, adversarial attack algorithms and defenses were implemented and evaluated using real traffic lights data and model. Using the spatial, one-pixel, Carlini & Wagner (C&W), and boundary attacks, a traffic-light classification neural network's robustness was evaluated. It was found that the particular model architecture was only really susceptible to the spatial attack (56% decrease in accuracy on perturbed images) and the C&W attack (0% accuracy on perturbed images). Subsequently, two defense strategies were proposed to increase the model's resilience to these attacks. First, the model was trained on a dataset composed of both perturbed and unperturbed images. This significantly helped against the spatial attack (22.5% decrease in success rate). Second, since the C&W attack maintained a 100% success rate, defensive distillation was then implemented. However, this only made the model more susceptible to the C&W attack (same success rate) and the spatial attack (13% increase in success rate). This shows that there is a lot of work that needs to be done to increase the robustness of applied neural networks against adversarial attacks. In summary, for the implemented traffic light CNN, only the spatial and C&W attacks can fool it. The implemented two defensive strategies were either marginally successful or completely failed against the attacks.

Full Text