Abstract

Named entity recognition (NER) is an important task in the processing of natural language, which needs to determine entity boundaries and classify them into pre-defined categories. For low-resource languages, most state-of-the-art systems require tens of thousands of annotated sentences to obtain high performance. However, there is minimal annotated data available about Uyghur and Hungarian (UH languages) NER tasks. There are also specificities in each task—differences in words and word order across languages make it a challenging problem. In this paper, we present an effective solution to providing a meaningful and easy-to-use feature extractor for named entity recognition tasks: fine-tuning the pre-trained language model. Therefore, we propose a fine-tuning method for a low-resource language model, which constructs a fine-tuning dataset through data augmentation; then the dataset of a high-resource language is added; and finally the cross-language pre-trained model is fine-tuned on this dataset. In addition, we propose an attention-based fine-tuning strategy that uses symmetry to better select relevant semantic and syntactic information from pre-trained language models and apply these symmetry features to name entity recognition tasks. We evaluated our approach on Uyghur and Hungarian datasets, which showed wonderful performance compared to some strong baselines. We close with an overview of the available resources for named entity recognition and some of the open research questions.

Highlights

  • With the popularization and rapid development of information technology, natural language processing technology plays a key role in the processing, understanding and applications of text in the face of numerous unstructured text datasets generated on the Internet

  • As a piece of key semantic information in natural language, the named entity recognition task has gradually become an important basic research problem in natural language processing since it was first proposed in the Sixth Message Understanding Conference (MUC-6) in the 1990s

  • We summarize the related works on three topics: (1) data augmentation; (2) cross-lingual pre-trained language models; (3) self-attention

Read more

Summary

Introduction

With the popularization and rapid development of information technology, natural language processing technology plays a key role in the processing, understanding and applications of text in the face of numerous unstructured text datasets generated on the Internet. Named entity recognition is one of the important basic research tasks, which plays an important role in computer automatic processing and the understanding of natural languages. Automatic extraction of semantic information [1] from natural language texts is becoming more and more important. As a piece of key semantic information in natural language, the named entity recognition task has gradually become an important basic research problem in natural language processing since it was first proposed in the Sixth Message Understanding Conference (MUC-6) in the 1990s. Named entity recognition can be used as an independent tool in the process of information extraction, and plays an important role in various research fields of natural language text processing, such as automatic text summarization, automatic answering machines, machine translation, knowledge base construction and machine reading comprehension

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call