Abstract

Ancient Chinese text segmentation is the basic work of the Intelligentization of ancient books. In this paper, the unsupervised thesaurus construction algorithm based on the minimum entropy model is applied to a large-scale ancient text corpus, and the lexicon composed of high-frequency cooccurring neighbor characters in the ancient text is extracted; and the lexicon is combined with existing word segmentation tools to perform ancient text segmentation experiment. The experimental results show that this method has different enhancement effects on the word segmentation effect of ancient texts in different periods, which shows that the vocabulary has a certain range of effectiveness. In addition, this article is one of the few works that apply monolingual word segmentation methods to ancient Chinese word segmentation. The work of this article has enriched the research in related fields.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.