Abstract

Korean morphological analysis has been considered as a sequence of morpheme processing and POS tagging. Thus, a pipeline model of the tasks has been adopted widely by previous studies. However, the model has a problem that it cannot utilize interactions among the tasks. This paper formulates Korean morphological analysis as a combination of the tasks and presents a tied sequence-to-sequence multi-task model for training the two tasks simultaneously without any explicit regularization. The experiments prove the proposed model achieves the state-of-the-art performance.

Highlights

  • Korean is an agglutinative language (Song, 2006)

  • This paper proposes a model to train morpheme processing and POS tagging simultaneously in Korean morphological analysis

  • Korean morphological analysis is different from existing NLP tasks such as English POS tagging (Toutanova et al, 2003; Manning, 2011), and joint word segmentation and POS tagging for Chinese (Zhang and Clark, 2008; Shao et al, 2017; Chen et al, 2017)

Read more

Summary

Introduction

Korean is an agglutinative language (Song, 2006). it is a fundamental step for understanding a sentence to analyze the grammatical structure of eojeols, where an eojeol is a linguistic unit segmented by a white space. The goal of Korean morphological analyzer is to decompose and recover morphemes from eojeols precisely (morpheme processing), and to assign POS tags to the decomposed and/or recovered morphemes accurately according to a context (POS tagging). Decompose and recover morphemes from eojeols or assign so-called POSMORPH tags (Heigold et al, 2016), and an actual POS tag sequence is determined or resolved from the POSMORPH tags using a sequential labeling algorithm. This pipeline model suffers from two kinds of weaknesses. This paper proposes a model to train morpheme processing and POS tagging simultaneously in Korean morphological analysis. P(m|x) corresponds to morpheme processing, while p(t|m, x) is POS tagging

Morpheme Processing and POS Tagging
Linguistic Unit in Morphological Analysis
Tied Sequence-to-Sequence Multi-Task Model
Methods
Experimental Settings
Experimental Results
Error Analysis
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call