A Distilled BERT with Hidden State and Soft Label Learning for Sentiment Classification

Shuyong Wei,Defa Yu,Chenguo Lv

doi:10.1088/1742-6596/1693/1/012076

Shuyong Wei, Defa Yu + Show 1 more

https://doi.org/10.1088/1742-6596/1693/1/012076

Copy DOI

Abstract

BERT is a pre-trained language model. Although the model is proven to be highly performant in a variety of natural language understanding tasks, its large size makes it hard to implement in practical situation where computing resource is limited. In order to improve the model efficiency of BERT for sentiment analysis task, we propose a novel distilled version of BERT. It distills knowledge from the full-size BERT model, which serves as the teacher model. The distilled model efficiently learns the last hidden state and soft label of the teacher model, which are different from previous models. We use distillation learning objective that is able to effectively transfer knowledge from the original big model to the compact model. Our model reduces BERT model size by ∼40%, but retains ∼98.2% of performance in sentiment classification task. Our model achieves promising results in SST-2 sentiment analysis, and outperforms previous distilled model.

Full Text