Abstract

In knowledge distillation (KD), a lightweight student model yields enhanced test accuracy by mimicking the behavior of a pre-trained large model (teacher). However, the cumbersome teacher model often makes over-confident responses, resulting in poor generalization when presented with unseen data. Consequently, a student trained by such a teacher also inherits this problem. To mitigate this issue, in this paper, we present a new framework of KD dubbed coded knowledge distillation (CKD) in which the student is trained to mimic instead the behavior of a coded teacher. Compared to the teacher in KD, the coded teacher in CKD has an additional adaptive encoding layer in the front, which adaptively encodes an input image into a compressed version (using JPEG encoding for instance) and then feeds the compressed input image to the pre-trained teacher. Comprehensive experimental results show the effectiveness of CKD over KD. In addition, we extend the deployment of a coded teacher to other knowledge transfer methods, showcasing its ability to enhance test accuracy across these methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.