Semi-Supervised Knowledge Distillation Via Teaching Assistant

Limuxuan He,Bei Hua

doi:10.54097/x7bfgw85

Abstract

As deep neural networks are widely used with computer vision, Model Compression Methods for knowledge distillation are being actively investigated in order to deploy them into smaller devices. However, when there are significant differences between models for students and teachers, a large amount of labeling wastes a great quantity of manpower while the performance of student learning decreases. In this paper, we propose an approach that uses semi-supervised teacher assistants to fuse knowledge distillation with teachers, effectively bridging the large teacher-student gap on a large number of unlabeled datasets and a small number of labeled datasets. We enable unsupervised teacher pre-training, fine-tuning, followed by teacher-assistant offline distillation, student-teacher-assistant oline distillation, and student-teacher offline distillation. We validate the effectiveness of the proposed method for the classification task using CIFAR10, CIFAR-100, and ImageNet.

Full Text