Abstract

Fine-tuning the pre-trained language model is the current mainstream method of text classification. Take the fine-tuning BERT model as an example, this kind of approach has three main problems: the first one is that the training of massive parameters will cause high training costs The second is that the model is very easy to over-fit in trainable samples, resulting in low transferability. Third, the fine-tuning model is not good at long text classification task. In this paper, we take the sentiment classification task as an example and use the classification accuracy as a metric. We compared two methods to problem one: using a compressed language model (decreased 0.1%) and using entirely frozen weight (reduced by 4%). For the second problem, in the case of using fixed weights (reduced by 4%), Convolutional Networks (CNN) and Capsule Networks (CAP) are used to extract n-gram features and clustering features so that the classification accuracy is improved (increased by 0.5% and decreased by 0.2% respectively). The corpus transfer test shows that CAP’s accuracy is greater than the fine-tuning model (increased by 0.6%). Meanwhile, this article proposes a method to process the long text classification task by expanding position embedding to support long text input (increased by 1.1%). Finally, this paper compares the training speed and parameter scale of different models under the combination of different strategies, and uses the F1 value, precision rate, recall rate and accuracy to measure each model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call