Abstract

This work addresses the problem of knowledge distillation for deep face recognition task. Knowledge distillation technique is known to be an effective way of model compression, which implies transferring of the knowledge from high-capacity teacher to a lightweight student. The knowledge and the way how it is distilled can be defined in different ways depending on the problem where the technique is applied. Considering the fact that face recognition is a typical metric learning task, we propose to perform knowledge distillation on a score-level. Specifically, for any pair of matching scores computed by teacher, our method forces student to have the same order for the corresponding matching scores. We evaluate proposed pairwise ranking distillation (PWR) approach using several face recognition benchmarks for both face verification and face identification scenarios. Experimental results show that PWR not only can improve over the baseline method by a large margin, but also outperforms other score-level distillation approaches.

Highlights

  • Face recognition systems are widely used today, and their quality keeps improving in order to better fit increasing security requirements

  • Using Labeled Faces in the Wild (LFW) [32], Cross-Pose LFW (CPLFW) [33], AgeDB [34], and MegaFace [35] datasets, we show that the proposed distillation method can significantly improve face recognition quality compared to the conventional way of training the student network

  • Note that knowledge distillation based on the equality of corresponding matching scores between teacher and student was investigated in [16], but for the sake of simplicity we refer to this approach as Relational knowledge distillation (RKD)-D

Read more

Summary

Introduction

Face recognition systems are widely used today, and their quality keeps improving in order to better fit increasing security requirements. If the model is supposed to run on a resource-limited embedded device or to be used in video surveillance system with thousands of queries per second, it is often necessary to replace a large network with smaller one for the purpose of satisfying the limitations of available computational resources This creates a strong demand for methods that reduce model complexity while trying to preserve its performance as much a possible. Network compression can be done in many different ways, including parameter quantization [4, 5], weights prunning [6, 7], low-rank factorization [8, 9], and knowledge distillation All these compression methods, except for the knowledge distillation, focus on reducing model size in terms of parameters while keeping network architecture roughly the same. We found that our pairwise ranking distillation technique outperforms other scores-based distillation approaches by a large margin

Individual knowledge distillation
Relational knowledge distillation
Knowledge distillation for Face Recognition
Pairwise ranking distillation
Relational function
Pairwise inversion loss function
Experiments
Datasets
Experimental setup
Evaluation results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.