A Comprehensive Defense Framework Against Model Extraction Attacks

Wenbo Jiang,Guowen Xu,Hongwei Li,Tianwei Zhang,Rongxing Lu

doi:10.1109/tdsc.2023.3261327

Abstract

As a promising service, Machine Learning as a Service (MLaaS) provides personalized inference functions for clients through paid APIs. Nevertheless, it is vulnerable to model extraction attacks, in which an attacker can extract a functionally-equivalent model by repeatedly querying the APIs with crafted samples. While numerous works have been proposed to defend against model extraction attacks, existing efforts are accompanied by limitations and low comprehensiveness. In this paper, we propose AMAO, a comprehensive defense framework against model extraction attacks. Specifically, AMAO consists of four interlinked successive phases: adversarial training is first exploited to weaken the effectiveness of model extraction attacks. Then, malicious query detection is used to detect malicious queries and mark malicious users. After that, we develop a label-flipping poisoning attack to instruct the adaptive query responses to malicious users. Besides, the image pHash algorithm is employed to ensure the indistinguishability of the query responses. Finally, the perturbed results are served as a backdoor to verify the ownership of any suspicious model. Extensive experiments demonstrate that AMAO outperforms existing defenses in defending against model extraction attacks and is also robust against the adaptive adversary who is aware of the defense.

Full Text