Abstract

Word segmentation is an essential part of Chinese learning. It often plays a decisive role in the quality of language processing results. Its domain vocabulary has the advantages of fast birth of new words and wide coverage. Large vocabulary, which has caused great difficulties in word segmentation and follow-up work in the field. This paper designs and implements a system. The system builds a corpus based on domain literature, trains a word-level language model based on statistical ideas, and uses the Viterbi algorithm to obtain preliminary Chinese word segmentation results. The word optimization algorithm optimizes the preliminary. The system provides users with functions such as keyword extraction, word frequency statistics, and word cloud drawing for the word segmentation results, so as to realize Chinese word segmentation and text analysis of field documents. The experimental results show that building system based on the corresponding ideas has improved the text analysis efficiency, related research on documents and text processing in the field have a certain promotion effect.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call