Improving Named Entity Recognition for Social Media with Data Augmentation

Wenzhong Liu,Xiaohui Cui

doi:10.3390/app13095360

Abstract

Social media is important for providing text information; however, due to its informal and unstructured nature, traditional named entity recognition (NER) methods face the challenge of achieving high accuracy when dealing with social media data. This paper proposes a new method for social media named entity recognition with data augmentation. First, we pre-train the language model by using a bi-directional encoder representation of the transformer (BERT) to obtain a semantic vector of the word based on the contextual information of the word. Then, we obtain similar entities via data augmentation methods and perform substitution or semantic transformation on these entities. After that, the input into the Bi-LSTM model is trained and then fused and fine-tuned to obtain the best label. In addition, our use of the self-attentive layer captures the essential information of the features and reduces the reliance on external information. Experimental results on the WNUT16, WNUT17, and OntoNotes 5.0 datasets confirm the effectiveness of our proposed model.

Full Text