Abstract

Segmenting Chinese texts into Chinese words is a very difficult problem. In this paper, a framework for a Chinese Internet search engine is presented. It discusses the characteristics and difficulties of segmentation of Chinese texts in Chinese search engines. The paper concludes that the correctness of Chinese segmentation is most important, and puts forward tactics for processing disambiguation of segmentation strings, new unknown words and stop words, and presents methods which satisfy the consistency of Chinese segmentation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call