Abstract

At this stage, the popular deep neural network models often encounter problems of high latency, difficult deployment and high hardware requirements in practical applications. Knowledge distillation is a good approach to solve these problems. We adopted an innovative knowledge distillation approach and formulated data augmentation strategies for the tasks, and obtained a lightweight model with 6. 7x acceleration ratio and 13. 6x compression ratio compared to the baseline model BERT-base, and the average performance of the lightweight model reached 95% of BERT-base for each task. We continue to conduct in-depth research to investigate some of the issues that remain in the knowledge distillation phase. To address the problems in distillation model selection and model fine-tuning, we propose a teacher model and student model selection strategy and a two-stage model fine-tuning strategy before and after the knowledge distillation stage. These two strategies further improve the average performance of the models to 98% of BERT-base. Finally, we developed a lightweight model evaluation scheme based on different types of downstream tasks, which provides a reference for subsequent practical applications when encountering similar tasks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.