Fine-Grained Named Entity Recognition Using a Multi-Stacked Feature Fusion and Dual-Stacked Output in Korean

Hongjin Kim,Harksoo Kim

doi:10.3390/app112210795

Abstract

Named entity recognition (NER) is a natural language processing task to identify spans that mention named entities and to annotate them with predefined named entity classes. Although many NER models based on machine learning have been proposed, their performance in terms of processing fine-grained NER tasks was less than acceptable. This is because the training data of a fine-grained NER task is much more unbalanced than those of a coarse-grained NER task. To overcome the problem presented by unbalanced data, we propose a fine-grained NER model that compensates for the sparseness of fine-grained NEs by using the contextual information of coarse-grained NEs. From another viewpoint, many NER models have used different levels of features, such as part-of-speech tags and gazetteer look-up results, in a nonhierarchical manner. Unfortunately, these models experience the feature interference problem. Our solution to this problem is to adopt a multi-stacked feature fusion scheme, which accepts different levels of features as its input. The proposed model is based on multi-stacked long short-term memories (LSTMs) with a multi-stacked feature fusion layer for acquiring multilevel embeddings and a dual-stacked output layer for predicting fine-grained NEs based on the categorical information of coarse-grained NEs. Our experiments indicate that the proposed model is capable of state-of-the-art performance. The results show that the proposed model can effectively alleviate the unbalanced data problem that frequently occurs in a fine-grained NER task. In addition, the multi-stacked feature fusion layer contributes to the improvement of NER performance, confirming that the proposed model can alleviate the feature interference problem. Based on this experimental result, we conclude that the proposed model is well-designed to effectively perform NER tasks.

Highlights

Named entity recognition (NER), a well-known task in natural language processing (NLP), identifies word sequences in texts and classifies them into predefined categories
This reveals that the proposed dual-stacked output layer contributes to alleviate the problem of unbalanced training data
We proposed a fine-grained NER model that compensates for the sparseness of finegrained NEs by using the contextual information of coarse-grained NEs

Summary

Introduction

Named entity recognition (NER), a well-known task in natural language processing (NLP), identifies word sequences in texts and classifies them into predefined categories. Early NER systems performed well in coarse-grained NER, they often required well-designed features in the form of language-dependent human knowledge. To address this issue, many NER systems have adopted deep learning methods to yield state-of-the-art (SOTA) performance. Many NER systems have adopted deep learning methods to yield state-of-the-art (SOTA) performance These models based on deep learning delivered good performance, it was restricted to tasks involving coarse-grained classification. In fine-grained NER for English language tasks, certain systems based on deep learning performed satisfactorily and were between 80% and 85% accurate. It is easy to find fine-grained NEs that seldom occur in training data.

Results

Discussion

Conclusion