CNN Model Compression by Merit-Based Distillation

Takumi Morikawa,Keisuke Kameyama

doi:10.1109/cspa57446.2023.10087390

Abstract

Deep learning has advanced dramatically in recent years, and especially large convolutional neural networks (CNNs) have shown outstanding performance in a wide variety of tasks. However, such large-scale CNNs may not be able to be utilized in practice because they require a lot of computational resources. Therefore, there is a need to realize a CNN in a small model that has high performance similar to the large CNN. One method to solve these problems is Distillation, which uses a model of large scale and high performance as the teacher and a small model as the student model, and performs model compression by transferring the knowledge of the teacher model to the student model. However, in distillation, even when the teacher model incorrectly guesses the training input, the student model learns to mimic the teacher. This can prevent the student model from learning better. In this paper, we propose Meritbased Distillation, in which the accuracy of the teacher model’s output determines the degree to which the student model mimics the teacher model’s output, thereby preventing the student to learn undesired mappings. In the experiment, the effect of the Merit-based Distillation is evaluated. Consequently, the proposed method outperforms the improvements in the generalization by the student models trained by other known methods.

Full Text