Defending against model extraction attacks with OOD feature learning and decision boundary confusion

Chuang Liang,Jie Huang,Zeping Zhang,Shuaishuai Zhang

doi:10.1016/j.cose.2023.103563

Abstract

Recent studies have demonstrated that Deep Neural Networks (DNNs) are vulnerable to model extraction attacks. In these attacks, the malicious users utilize Out-Of-Distribution (OOD) data as attack data to query the victim model and then use the obtained predictions to create a clone model with similar accuracy to the victim model. These attacks occur because the victim models rely on the decision boundary of training data to analyze attack data and generate prediction vectors that reveal information about the decision boundary. To counter model extraction attacks, we propose a novel defense method by performing additional training on auxiliary data to form a defense model with a confused decision boundary. Specifically, when facing attack data that contain auxiliary features, the defense model can identify them and produce predictions detrimental to the construction of the decision boundary of the training data. Therefore, the defense model can reduce the accuracy of the clone model while causing minimal impact on the accuracy of benign data. Moreover, defenders can detect attack samples from malicious users and prohibit these users from accessing the victim model. We further analyze the impact of various auxiliary data on defense effectiveness and present a methodology for designing auxiliary data. Through evaluation against several extraction attacks, our approach successfully achieves a combination of decreased accuracy in clone models and detection of attack samples. Also, our defense performance outperforms existing state-of-the-art methods.

Full Text