Abstract

In well-spaced Korean sentences, morphological analysis is the first step in natural language processing, in which a Korean sentence is segmented into a sequence of morphemes and the parts of speech of the segmented morphemes are determined. Named entity recognition is a natural language processing task carried out to obtain morpheme sequences with specific meanings, such as person, location, and organization names. Although morphological analysis and named entity recognition are closely associated with each other, they have been independently studied and have exhibited the inevitable error propagation problem. Hence, we propose an integrated model based on label attention networks that simultaneously performs morphological analysis and named entity recognition. The proposed model comprises two layers of neural network models that are closely associated with each other. The lower layer performs a morphological analysis, whereas the upper layer performs a named entity recognition. In our experiments using a public gold-labeled dataset, the proposed model outperformed previous state-of-the-art models used for morphological analysis and named entity recognition. Furthermore, the results indicated that the integrated architecture could alleviate the error propagation problem.

Highlights

  • IntroductionIn Korean, morphological analysis (MA) is generally performed in the order of morpheme segmentation and part-of-speech (POS)

  • A morpheme refers to the smallest meaningful word in a phrase

  • To obtain optimal label paths better than those obtained with conditional random fields (CRFs), a label attention network (LAN) was proposed, which captured the potential long-term label dependency by providing incrementally refined label distributions with hierarchical attention to each word

Read more

Summary

Introduction

In Korean, morphological analysis (MA) is generally performed in the order of morpheme segmentation and part-of-speech (POS). Many NER models generally use the results of morphological analysis as informative clues [1,2]. This pipeline architecture causes the well-known error propagation problem. MA models for agglutinative languages, such as Korean and Japanese, demonstrate worse performances than those of isolating languages, which significantly affect the performances of the corresponding NER models. Sci. 2020, 10, 3740 capitalization, detecting NEs without any morphological information such as morpheme boundaries and POS tags is difficult.

Correct Results
Previous Studies
Integrated Model for MA and NER
Morpheme
Datasets and Experimental Setups
Implementation
Experimental Results
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.