Abstract

Vision Transformer (ViT) has achieved unprecedented success in vision tasks with the assistance of abundant data. However, the lack of inductive bias in lightweight ViT makes learning locality challenging on small datasets, leading to poor performance. This limitation impedes the application of lightweight ViT in scenarios with limited datasets and computational power. Knowledge Distillation (KD) allows student models to benefit from the teacher model. However, in the progressive learning stage, traditional single-stage KD methods are usually suboptimal for delivering fixed knowledge to the growing student model. To address these issues, we propose a simple yet effective two-stage KD method called Curriculum Information Knowledge Distillation (CIKD) for the first time. Specifically, we incorporate a curriculum learning framework, progressing from easy to difficult, in the KD curriculum. At the first stage, i.e., Attention Locality Imitation (ALI), the student model learns locality from the low-level semantic features of the teacher model through self-attention distillation. Afterward, at the second stage, i.e., Logit Mimicking (LM), the student model learns label information and high-level semantic logit from the teacher model. Without bells and whistles, our approach achieves state-of-the-art results on 8 small-scale datasets with ViT-Tiny (5.0M). Our code and model weights are available at: https://github.com/newLLing/CIKD.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.