Abstract

Ancient Chinese text segmentation is the basic work of the Intelligentization of ancient books. In this paper, the unsupervised thesaurus construction algorithm based on the minimum entropy model is applied to a large-scale ancient text corpus, and the lexicon composed of high-frequency cooccurring neighbor characters in the ancient text is extracted; and the lexicon is combined with existing word segmentation tools to perform ancient text segmentation experiment. The experimental results show that this method has different enhancement effects on the word segmentation effect of ancient texts in different periods, which shows that the vocabulary has a certain range of effectiveness. In addition, this article is one of the few works that apply monolingual word segmentation methods to ancient Chinese word segmentation. The work of this article has enriched the research in related fields.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call