Adversarial attack vulnerability of medical image analysis systems: Unexplored factors

Gerda Bortsova,Ioannis Katramados,Laurens Hogeweg,Josien P.W Pluim,Florian Dubost,Clara I Sánchez,Mitko Veta,Suzanne C Wetstein,Bart Liefers,Cristina González-Gonzalo,Bram Van Ginneken,Marleen De Bruijne

doi:10.1016/j.media.2021.102141

Abstract

Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be vulnerable to adversarial attacks due to strong financial incentives and the associated technological infrastructure.In this paper, we study previously unexplored factors affecting adversarial attack vulnerability of deep learning MedIA systems in three medical domains: ophthalmology, radiology, and pathology. We focus on adversarial black-box settings, in which the attacker does not have full access to the target model and usually uses another model, commonly referred to as surrogate model, to craft adversarial examples that are then transferred to the target model. We consider this to be the most realistic scenario for MedIA systems. Firstly, we study the effect of weight initialization (pre-training on ImageNet or random initialization) on the transferability of adversarial attacks from the surrogate model to the target model, i.e., how effective attacks crafted using the surrogate model are on the target model. Secondly, we study the influence of differences in development (training and validation) data between target and surrogate models. We further study the interaction of weight initialization and data differences with differences in model architecture. All experiments were done with a perturbation degree tuned to ensure maximal transferability at minimal visual perceptibility of the attacks.Our experiments show that pre-training may dramatically increase the transferability of adversarial examples, even when the target and surrogate’s architectures are different: the larger the performance gain using pre-training, the larger the transferability. Differences in the development data between target and surrogate models considerably decrease the performance of the attack; this decrease is further amplified by difference in the model architecture. We believe these factors should be considered when developing security-critical MedIA systems planned to be deployed in clinical practice. We recommend avoiding using only standard components, such as pre-trained architectures and publicly available datasets, as well as disclosure of design specifications, in addition to using adversarial defense methods. When evaluating the vulnerability of MedIA systems to adversarial attacks, various attack scenarios and target-surrogate differences should be simulated to achieve realistic robustness estimates. The code and all trained models used in our experiments are publicly available.3

Highlights

Deep learning (DL) has been shown to achieve close or even superior performance to that of experts in many medical image analysis (MedIA) applications, including in ophthalmology (Gulshan et al, 2016; Ting et al, 2017), radiology (Rajpurkar et al, 2017), and pathology (Bejnordi et al, 2017; Bulten et al, 2020; Wetstein et al, 2020)
We observed that pre-training on ImageNet may dramatically increase the transferability of adversarial examples in MedIA systems; the larger the performance gain achieved by pre-training, the larger the transfer and the more vulnerable the pre-trained system is to attacks by pre-trained surrogate models
We showed that disparity in development data and model architecture between target and surrogate models can substantially decrease the success of attacks

Summary

Introduction

Deep learning (DL) has been shown to achieve close or even superior performance to that of experts in many medical image analysis (MedIA) applications, including in ophthalmology (Gulshan et al, 2016; Ting et al, 2017), radiology (Rajpurkar et al, 2017), and pathology (Bejnordi et al, 2017; Bulten et al, 2020; Wetstein et al, 2020). A threat to DL systems is posed by socalled adversarial attacks (Szegedy et al, 2013) Such attacks apply a carefully engineered, subtle perturbation to the input of the target model to cause misclassification. Apart from this type of attack, images can be manipulated to change their content: for example, signs of disease can be removed from a diseased image or added to a healthy image (Xia et al, 2020; Sun et al, 2020; Baumgartner et al, 2018; Becker et al, 2019), which, in turn, can change network predictions Developing these synthetically changed images remains challenging (Xia et al, 2020), as it is hard to guarantee they look realistic, hard to control which image structures are changed, and these algorithms may be difficult to train and require large training datasets. We consider adversarial attacks to be a more feasible, and more likely type of attack on MedIA systems, which is why we have limited our scope to adversarial attacks

Methods

Results

Discussion

Conclusion