Abstract
Knowledge distillation is a widely-used and effective technique to boost the performance of a lightweight student network, by having it mimic the behavior of a more powerful teacher network. This paper presents an end-to-end online knowledge distillation strategy, in which several peer students are trained together and their predictions are aggregated into a powerful teacher ensemble via an effective ensembling technique that uses an online supervisor network to determine the optimal way of combining the student logits. Intuitively, this supervisor network learns the area of expertise of each student and assigns a weight to each student accordingly►it has knowledge of the input image, the ground truth data, and the predictions of each individual student, and tries to answer the following question: “how much can we rely on each student’s prediction, given the current input image with this ground truth class?”. The proposed technique can be thought of as an inference optimization mechanism as it improves the overall accuracy over the same number of parameters. The experiments we performed show that the proposed knowledge distillation consistently improves the performance of the knowledge-distilled students vs. the independently trained students.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.