Abstract

General language model BERT pre-trained on cross-domain text corpus, BookCorpus and Wikipedia, achieves excellent performance on a couple of natural language processing tasks through the way of fine-tuning in the downstream tasks. But it still lacks of task-specific knowledge and domain-related knowledge for further improving the performance of BERT model and more detailed fine-tuning strategy analyses are necessary. To address these problem, a BERT-based text classification model BERT4TC is proposed via constructing auxiliary sentence to turn the classification task into a binary sentence-pair one, aiming to address the limited training data problem and task-awareness problem. The architecture and implementation details of BERT4TC are also presented, as well as a post-training approach for addressing the domain challenge of BERT. Finally, extensive experiments are conducted on seven public widely-studied datasets for analyzing the fine-tuning strategies from the perspectives of learning rate, sequence length and hidden state vector selection. After that, BERT4TC models with different auxiliary sentences and post-training objectives are compared and analyzed in depth. The experiment results show that BERT4TC with suitable auxiliary sentence significantly outperforms both typical feature-based methods and fine-tuning methods, and achieves new state-of-the-art performance on multi-class classification datasets. For binary sentiment classification datasets, our BERT4TC post-trained with suitable domain-related corpus also achieves better results compared with original BERT model.

Highlights

  • Text classification has been a classic task and heated research hotspot in the field of natural language processing (NLP), aiming to assign pre-defined categories to a given text sequence

  • Using general language model BERT pre-trained on large scale of unlabeled corpus and fine-tune it in the downstream tasks has achieves many state-of-the-art results in multiple NLP tasks

  • We propose a BERT-based text classification model by constructing auxiliary sentence to turn the task into a sentence-pair one, aiming to incorporate more task-specific knowledge and address task-awareness challenge

Read more

Summary

INTRODUCTION

Text classification has been a classic task and heated research hotspot in the field of natural language processing (NLP), aiming to assign pre-defined categories to a given text sequence. To address the polysemous challenge, some scholars propose the notion of contextualized word vectors, i.e. CoVe [10] and ELMo (Embeddings from Language Models) [11], to learned multiple vectors for a word according to its different appearing contexts. ELMo uses vectors derived from a bidirectional LSTM that is trained with a coupled language model (LM) objective on a large text corpus and integrates these contextual word vectors with existing taskspecific supervised neural architectures Both CoVe and ELMo successfully generalize traditional word vectors to contain context-sensitive features, but these learned representations are still typically used as features in a downstream model [12]. In the past two years, in order to avoid heavily-engineered task-specific structures and greatly decrease the parameters to be learned from scratch, some scholars contributed along another direction by pre-training general language model on large-scale unsupervised text corpus and fine-tuning it for the downstream tasks.

RELATED WORK
BERT4TC
BERT ENCODER
OUTPUT LAYER
EXPERIMENTS
EXPERIMENT 1
EXPERIMENT 2
EXPERIMENT 3
EXPERIMENT 4
EXPERIMENT 5
EXPERIMENT 6
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.