Abstract

Due to the high computational cost, the application of deep neural networks (DNNs) to the real-time tasks has been limited. A possible solution is to compress the size of the model so that the demand for computation resources can be decreased. A popular method is called knowledge distillation (KD). The basic philosophy behind KD is to transfer the information extracted from the larger teacher network to the smaller student network. The general knowledge transfer strategy is to match the one-to-one logit or intermedia layers between the teacher and student networks, correspondingly. This objective may neglect the informative relationship information between different samples. In this paper, we borrow the idea of metric learning to transfer the contrastive relationship information learned from teacher network to the student. Specifically, we use the well-known Triplet loss to regularize the training of the student network. By the modified negative selection strategy, our Contrastive Knowledge Distillation (CKD) method can efficiently improve the performance of the student network compared with the traditional KD methods. Empirical experiments on KD benchmarks and real-world datasets also demonstrate the superiority of CKD.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call