Adversarial Training of Anti-Distilled Neural Network with Semantic Regulation of Class Confidence

Zi Wang,Husheng Li,Chengcheng Li

doi:10.1109/icip46576.2022.9897169

Abstract

Knowledge distillation (KD) has been identified as an effective knowledge transfer approach. By learning from the outputs of a pre-trained, over-parameterized teacher network, a compact student network can be trained efficiently to achieve superior performance. Although KD has gained substantial successes, exposure to pre-trained models usually causes potential risks of intellectual property leaks. From a model stealing attacker’s perspective, one can easily mimic the model functionality via KD, resulting in huge financial loss. In this paper, we propose a novel adversarial training framework called semantic nasty teacher, which prevents the teacher model from being copied by the attacker. In specific, we disentangle the semantic relationship in the output logits when training the teacher model, which is the key to success in KD. Experiment results show that neural networks trained with our approach only sacrifices little performance while canceling out the probability of KD-based model stealing.

Full Text