Abstract

There has been recent interest in statistical approaches to Korean morphological analysis. However, previous studies have been based mostly on generative models, including a hidden Markov model (HMM), without utilizing discriminative models such as a conditional random field (CRF). We present a two-stage discriminative approach based on CRFs for Korean morphological analysis. Similar to methods used for Chinese, we perform two disambiguation procedures based on CRFs: (1) morpheme segmentation and (2) POS tagging. In morpheme segmentation, an input sentence is segmented into sequences of morphemes, where a morpheme unit is either atomic or compound. In the POS tagging procedure, each morpheme (atomic or compound) is assigned a POS tag. Once POS tagging is complete, we carry out a post-processing of the compound morphemes, where each compound morpheme is further decomposed into atomic morphemes, which is based on pre-analyzed patterns and generalized HMMs obtained from the given tagged corpus. Experimental results show the promise of our proposed method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.