Abstract

Knowledge distillation is a deep learning method that mimics the way that humans teach, i.e., a teacher network is used to guide the training of a student one. Knowledge distillation can generate an efficient student network to facilitate deployment in resource-constrained edge computing devices. Existing studies have typically mined knowledge from a teacher network and transferred it to a student one. The latter can only passively receive knowledge but cannot understand how the former acquires the knowledge, thus limiting the latter’s performance improvement. Inspired by the old Chinese saying “Give a man a fish and you feed him for a day; teach a man how to fish and you feed him for a lifetime,” this work proposes a Skill-transferring Knowledge Distillation (SKD) method to boost a student network’s ability to create new valuable knowledge. SKD consists of two main meta-learning networks: Teacher Behavior Teaching and Teacher Experience Teaching. The former captures the process of a teacher network’s learning behavior in the hidden layers and can predict the teacher network’s subsequent behavior based on previous ones. The latter models the optimal empirical knowledge of a teacher network’s output layer at each learning stage. With their help, a teacher network can provide its actions to a student one in the subsequent behavior and its optimal empirical knowledge in the current stage. The overall performance of SKD is verified through its application to multiple object recognition tasks and comparison with the state of the art.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call