Abstract

Recent researches demonstrate that deep learning models are vulnerable to membership inference attacks. Few defenses have been proposed, but suffer from compromising the performance or quality of the target model, or cannot effectively resist membership inference attacks. This paper proposes an adversarial example based privacy-preserving technique (AEPPT), which adds crafted adversarial perturbations to the prediction of the target model to mislead the adversary's membership inference model. The added adversarial perturbations do not affect the accuracy of the target model, while can prevent the adversary from inferring whether a specific data is in the training set of the target model. Since AEPPT only modifies the original output of the target model, the proposed method does not require to modify or retrain the target model. Experimental results show that the proposed method can reduce the inference accuracy and precision of the membership inference model to around 50%, which is close to a random guess. The recall of the membership inference model drops from 88.24% to 6.48% on TinyImageNet dataset, drops from 98.5% to 17.1% on Purchase dataset, drops from 97.70% to 40.30% on ImageNet dataset, and drops from 88.7% to 51.9% on the CIFAR100 dataset, respectively. Besides, the performances of the proposed method under various factors (i.e., perturbation step size, number of adversary's data, proportion of member data and non-member data used to train substitute membership inference model, number of target model's output classes, and different membership inference models) are evaluated, which demonstrate that the proposed method can resist membership inference attacks under different conditions. Moreover, for those adaptive attacks where the adversary knows the defense mechanism, the proposed AEPPT is also demonstrated to be effective. Compared with the state-of-the-art defense methods, the proposed defense can significantly degrade the accuracy and precision of membership inference attacks to 50% (i.e., random guess), while the normal performance and utility of the target model are not affected.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call