Abstract

Chinese word segmentation plays an important role in search engine, artificial intelligence, machine translation and so on. There are currently three main word segmentation algorithms: dictionary-based word segmentation algorithms, statistics-based word segmentation algorithms, and understanding-based word segmentation algorithms. However, few people combine these three methods or two of them. Therefore, a Chinese word segmentation model is proposed based on a combination of statistical word segmentation algorithm and understanding-based word segmentation algorithm. It combines Hidden Markov Model (HMM) word segmentation and Bi-LSTM word segmentation to improve accuracy. The main method is to make lexical statistics on the results of the two participles, and to choose the best results based on the statistical results, and then to combine them into the final word segmentation results. This combined word segmentation model is applied to perform experiments on the MSRA corpus provided by Bakeoff. Experiments show that the accuracy of word segmentation results is 12.52% higher than that of traditional HMM model and 0.19% higher than that of BI-LSTM model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call