Chinese word segmentation as morpheme-based lexical chunking

Guohong Fu,Chunyu Kit,Jonathan J Webster

doi:10.1016/j.ins.2008.01.001

Abstract

Chinese word segmentation plays an important role in many Chinese language processing tasks such as information retrieval and text mining. Recent research in Chinese word segmentation focuses on tagging approaches with either characters or words as tagging units. In this paper we present a morpheme-based chunking approach and implement it in a two-stage system. It consists of two main components, namely a morpheme segmentation component to segment an input sentence to a sequence of morphemes based on morpheme-formation models and bigram language models, and a lexical chunking component to label each segmented morpheme’s position in a word of a special type with the aid of lexicalized hidden Markov models. To facilitate these tasks, a statistically-based technique is also developed for automatically compiling a morpheme dictionary from a segmented or tagged corpus. To evaluate this approach, we conduct a closed test and an open test using the 2005 SIGHAN Bakeoff data. Our system demonstrates state-of-the-art performance on different test sets, showing the benefits of choosing morphemes as tagging units. Furthermore, the open test results indicate significant performance enhancement using lexicalization and part-of-speech features.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Chinese word segmentation as morpheme-based lexical chunking

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Jan 8, 2008
Citations: 33

Similar Papers

Chinese Word Segmentation and Recognition Based on Separable Convolution Bidirectional Long Short-Term Memory and Feature Point
...
-
, et. al. ...
18 Dec 2020
18 Dec 2020

Automatic extraction of new words based on Google News corpora for supporting lexicon-based Chinese word segmentation systems
Chin-Ming Hong ... Chao-Yang Chiu
Expert Systems with Applications | VOL. 36
Chin-Ming Hong, et. al.Chin-Ming Hong ... Chao-Yang Chiu
29 Feb 2008
Expert Systems with Applications | VOL. 36

A survey on Chinese word segmentation technology
Liu Qun
-
Liu QunLiu Qun
01 Oct 2009
01 Oct 2009

DGeoSegmenter: A dictionary-based Chinese word segmenter for the geoscience domain
Qinjun Qiu ... Wenjia Li
Computers & Geosciences | VOL. 121
Qinjun Qiu, et. al.Qinjun Qiu ... Wenjia Li
07 Sep 2018
Computers & Geosciences | VOL. 121

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Chinese word segmentation as morpheme-based lexical chunking

Abstract

Talk to us

Similar Papers

More From: Information Sciences