Chapter 8 - Knowledge distillation

Nikolaos Passalis,Maria Tzelepi,Anastasios Tefas

doi:10.1016/b978-0-32-385787-1.00013-0

Abstract

The need to develop faster, more lightweight and flexible Deep Learning (DL) models has led to the development of a wide variety of methods. Among the most well-known methods for improving the accuracy of lightweight DL models is knowledge distillation, also known as knowledge transfer. Knowledge distillation is capable of improving the effectiveness of the training process by transferring the knowledge encoded in a large and complex neural network into a smaller and faster one. This chapter aims to provide an introduction to knowledge distillation approaches by presenting some of the most representative methods that will equip the reader with the necessary knowledge and tools to apply these methods in practice, as well as to follow this rapidly advancing field. First, we present the seminal neural network distillation approach, which kick–started the field. Then we present a generalization of this approach, which provides a general probabilistic view on knowledge distillation, allowing for going beyond classification tasks and overcoming some significant limitations of earlier methods. Multilayer knowledge distillation approaches, which employ multiple layers for the process of knowledge transfer and further improve the effectiveness of distillation, are also presented and discussed. Finally, we present and discuss some more advanced ways to train the teacher model, allowing for deriving more effective distillation methods, such as online distillation methods, which allow for simultaneously training both the teacher and student models, as well as, self-distillation methods, which do not even use a different teacher model and are capable of reusing the knowledge extracted from the student model.

Full Text