Hybrid mix-up contrastive knowledge distillation

Jian Zhang,Ze Tao,Kehua Guo,Haowei Li,Shichao Zhang

doi:10.1016/j.ins.2024.120107

Abstract

Knowledge distillation (KD) aims to build a lightweight deep neural network model under the guidance of a large-scale teacher model for model simplicity. Despite improved model efficiency through the KD technique, the performance gap between a teacher model and the trained student model remains significant. This is because the knowledge of the teacher model is not effectively transferred to the student model since the mapping landscape of the large-scale teacher model is not fully explored. To tackle this research gap, we propose a novel Hybrid Mix-up Contrastive Knowledge Distillation (HMCKD) approach, which facilitates a thorough and reliable mapping solution space exploration in order to significantly improve the performance of the student model. Specifically, we design a hybrid mixing strategy, including image-level mixing and feature-level mixing, to form a smoother mapping landscape as a means to provide a stronger guidance in order to embed its richer dark knowledge from a teacher model to its student model. Additionally, we apply two other strategies, including contrastive learning and top-k guided selection, in order to ensure more effective knowledge transferability from the teacher model. Extensive experiments have proved that our proposed HMCKD approach outperforms state-of-the-art knowledge distillation methods when tested on 6 publicly available datasets, such as CIFAR-100, CIFAR-100-C, STL-10, SVHN, TinyImageNet, and ImageNet. Particularly on CIFAR-100 dataset, the average accuracy of students using HMCKD increased by 1.47%. Further, both the visualization results and similarity quantifications have confirmed the narrowed knowledge gap between the teacher and student models. Our source code is available at https://github.com/lambett/HMCKD.

Full Text