Abstract
Knowledge distillation has attracted great attentions from computer vision researchers in recent years. However, the performance of student model will suffer from the absence of the complete dataset, which is used to train the teacher model. Especially for conducting knowledge distillation between heterogeneous models, it is difficult for student model to learn and receive guidance with few data. In this paper, a data augmentation method is proposed based on the attentional response of teacher model. The proposed method utilizes the knowledge in teacher model without requiring homogeneous architecture between teacher model and student model. Experimental results demonstrate that combining the proposed data augmentation method with different knowledge distillation methods, the performance of student model can be improved in knowledge distillation with few data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.