Assessing the Threat of Adversarial Examples on Deep Neural Networks for Remote Sensing Scene Classification: Attacks and Defenses

Yonghao Xu,Liangpei Zhang,Bo Du

doi:10.1109/tgrs.2020.2999962

Abstract

Deep neural networks, which can learn the representative and discriminative features from data in a hierarchical manner, have achieved state-of-the-art performance in the remote sensing scene classification task. Despite the great success that deep learning algorithms have obtained, their vulnerability toward adversarial examples deserves our special attention. In this article, we systematically analyze the threat of adversarial examples on deep neural networks for remote sensing scene classification. Both targeted and untargeted attacks are performed to generate subtle adversarial perturbations, which are imperceptible to a human observer but may easily fool the deep learning models. Simply adding these perturbations to the original high-resolution remote sensing (HRRS) images, adversarial examples can be generated, and there are only slight differences between the adversarial examples and the original ones. An intriguing discovery in our study shows that most of these adversarial examples may be misclassified into the wrong category by the state-of-the-art deep neural networks with very high confidence. This phenomenon, undoubtedly, may limit the practical deployment of these deep learning models in the safety-critical remote sensing field. To address this problem, the adversarial training strategy is further investigated in this article, which significantly increases the resistibility of deep models toward adversarial examples. Extensive experiments on three benchmark HRRS image data sets demonstrate that while most of the well-known deep neural networks are sensitive to adversarial perturbations, the adversarial training strategy helps to alleviate their vulnerability toward adversarial examples.

Full Text