Abstract

Model lightweight aims to solve the problems of various large models for slow training and high resource requirement. Knowledge distillation can be a good solution to these problems. We built a lightweight model that meet the competition requirements and have prominent NLP capabilities. RoBERTa-tiny-clue was used as our backbone model. We tested the effect of soft labels and hard labels on knowledge distillation, made knowledge distillation, fine-tuned this model to get a lighter model with better performance, and then applied it downstream NLP tasks. We also adopted a series of data augmentation methods to improve the performance of the model on downstream tasks, customized different optimization solutions for four tasks. Based on open-source pre-trained model RoBERTa-tiny-clue and public available datasets, we achieved 15 times smaller and 10 times faster than BERT-base, and 95% of BERT-base performance on downstram NLP tasks. Using suitable data augmentation methods for the trained lightweight model, the performance of the model on various downstream tasks reaches or exceeds BERT-base.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.