Lexical analysis for Chinese‐ difficulties and possible solutions†

Keh‐Jiann Chen

doi:10.1080/02533839.1999.9670494

Abstract

Chinese sentences are composed with strings of characters without blanks to mark word boundaries. However, the basic processing unit for sentence processing is the word. It is the smallest meaningful, freely used unit for any natural language. Therefore lexical analysis became the first step in processing Chinese sentences. Usually a lexicon is utilized to match words and provide their syntactic and semantic information in the process of lexical analysis. During the word matching process, problems of segmentation ambiguity and occurrences of unknown words will occur. In this paper, both statistical methods and rule‐based methods are discussed for their advantages and disadvantages in solving segmentation ambiguities. For unknown word identification, off‐line word extraction methods and on‐line unknown word identification strategies are surveyed. Both methods complement each other in solving the problem. The strategies and knowledge sources for implementing a practical system are also discussed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Lexical analysis for Chinese‐ difficulties and possible solutions†

Abstract

Talk to us

Similar Papers

More From: Journal of the Chinese Institute of Engineers

Lead the way for us

Journal: Journal of the Chinese Institute of Engineers	Publication Date: Jul 1, 1999
Citations: 4

Similar Papers

Unknown word extraction for Chinese documents
Keh-Jiann Chen ... Wei-Yun Ma
-
Keh-Jiann Chen, et. al.Keh-Jiann Chen ... Wei-Yun Ma
01 Jan 2002
01 Jan 2002

A heuristic method based on a statistical approach for Chinese text segmentation
Christopher C Yang ... K W Li
Journal of the American Society for Information Science and Technology | VOL. 56
Christopher C Yang, et. al.Christopher C Yang ... K W Li
09 Sep 2005
Journal of the American Society for Information Science and Technology | VOL. 56

Application of Conditional Random Fields model in unknown words identification
Hai-Jun Zhang ... Wei-Min Pan
-
Hai-Jun Zhang, et. al.Hai-Jun Zhang ... Wei-Min Pan
01 Jul 2010
01 Jul 2010

Chinese Unknown Word Identification Using Class-Based LM
Guohong Fu ... Kang-Kwong Luke
-
Guohong Fu, et. al.Guohong Fu ... Kang-Kwong Luke
01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Lexical analysis for Chinese‐ difficulties and possible solutions†

Abstract

Talk to us

Similar Papers

More From: Journal of the Chinese Institute of Engineers