Abstract

There is a huge performance gap between formal and informal language understanding tasks. The recent pre-trained models that improved the performance of formal language understanding tasks did not achieve a comparable result on informal language. We pro-pose a data annealing transfer learning procedure to bridge the performance gap on informal natural language understanding tasks. It successfully utilizes a pre-trained model such as BERT in informal language. In our data annealing procedure, the training set contains mainly formal text data at first; then, the proportion of the informal text data is gradually increased during the training process. Our data annealing procedure is model-independent and can be applied to various tasks. We validate its effectiveness in exhaustive experiments. When BERT is implemented with our learning procedure, it outperforms all the state-of-the-art models on the three common informal language tasks.

Highlights

  • Introduction and Related WorkBecause of the noisy nature of the informal language and the shortage of labeled data, the progress on informal language is not as promising as in formal language

  • DA means the model is implemented with data annealing procedure

  • When LSTM, BERTBASE, and BERTLARGE are used as the training model under our data annealing procedure, they achieve better performances compared to other transfer learning paradigms

Read more

Summary

Introduction and Related Work

Because of the noisy nature of the informal language and the shortage of labeled data, the progress on informal language is not as promising as in formal language. Many tasks on formal data obtain a high performance due to deep neural models (Peters et al, 2018; Devlin et al, 2018) These state-of-the-art models’ excellent performance usually fails to transfer to informal data directly. A gradually decayed learning rate enhances the model with more freedom of exploration at the beginning and leads to better model performance (Zeiler, 2012; Yang and Zhang, 2018; Devlin et al, 2018). Another widespread implementation of annealing is simulated annealing (Bertsimas and Tsitsiklis, 1993). Experiments validate our data annealing procedure’s effectiveness when there are limited training resources in target data

Data Annealing
Datasets
Model Setting
Experiment Results
Error Analysis
Conclusion
A Mispredicted Sentences Examples on Named Entity Recognition Task
B Dataset Statistic
C Hyper-parameters and Training process

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.