Abstract

Chinese sentences are composed with strings of characters without blanks to mark word boundaries. However, the basic processing unit for sentence processing is the word. It is the smallest meaningful, freely used unit for any natural language. Therefore lexical analysis became the first step in processing Chinese sentences. Usually a lexicon is utilized to match words and provide their syntactic and semantic information in the process of lexical analysis. During the word matching process, problems of segmentation ambiguity and occurrences of unknown words will occur. In this paper, both statistical methods and rule‐based methods are discussed for their advantages and disadvantages in solving segmentation ambiguities. For unknown word identification, off‐line word extraction methods and on‐line unknown word identification strategies are surveyed. Both methods complement each other in solving the problem. The strategies and knowledge sources for implementing a practical system are also discussed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.