Abstract
There is a huge performance gap between formal and informal language understanding tasks. The recent pre-trained models that improved the performance of formal language understanding tasks did not achieve a comparable result on informal language. We pro-pose a data annealing transfer learning procedure to bridge the performance gap on informal natural language understanding tasks. It successfully utilizes a pre-trained model such as BERT in informal language. In our data annealing procedure, the training set contains mainly formal text data at first; then, the proportion of the informal text data is gradually increased during the training process. Our data annealing procedure is model-independent and can be applied to various tasks. We validate its effectiveness in exhaustive experiments. When BERT is implemented with our learning procedure, it outperforms all the state-of-the-art models on the three common informal language tasks.
Highlights
Introduction and Related WorkBecause of the noisy nature of the informal language and the shortage of labeled data, the progress on informal language is not as promising as in formal language
DA means the model is implemented with data annealing procedure
When LSTM, BERTBASE, and BERTLARGE are used as the training model under our data annealing procedure, they achieve better performances compared to other transfer learning paradigms
Summary
Because of the noisy nature of the informal language and the shortage of labeled data, the progress on informal language is not as promising as in formal language. Many tasks on formal data obtain a high performance due to deep neural models (Peters et al, 2018; Devlin et al, 2018) These state-of-the-art models’ excellent performance usually fails to transfer to informal data directly. A gradually decayed learning rate enhances the model with more freedom of exploration at the beginning and leads to better model performance (Zeiler, 2012; Yang and Zhang, 2018; Devlin et al, 2018). Another widespread implementation of annealing is simulated annealing (Bertsimas and Tsitsiklis, 1993). Experiments validate our data annealing procedure’s effectiveness when there are limited training resources in target data
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.