Abstract

Knowledge distillation (KD) has been identified as an effective knowledge transfer approach. By learning from the outputs of a pre-trained, over-parameterized teacher network, a compact student network can be trained efficiently to achieve superior performance. Although KD has gained substantial successes, exposure to pre-trained models usually causes potential risks of intellectual property leaks. From a model stealing attacker’s perspective, one can easily mimic the model functionality via KD, resulting in huge financial loss. In this paper, we propose a novel adversarial training framework called semantic nasty teacher, which prevents the teacher model from being copied by the attacker. In specific, we disentangle the semantic relationship in the output logits when training the teacher model, which is the key to success in KD. Experiment results show that neural networks trained with our approach only sacrifices little performance while canceling out the probability of KD-based model stealing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call