Abstract

Aiming at the problems of incomplete semantic understanding and insufficient word position information in complex Chinese word segmentation, a large-scale multilingual database Chinese word segmentation method based on hybrid deep learning technology is proposed. In this scheme, the bidirectional context information is effectively used through the bidirectional threshold circulation unit, and the long-distance dependency information is effectively modeled by the BiLSTM model. At the same time, the dependency between the output tags can be considered through the CRF layer. Then a double byte query method is proposed to expand the CRF model, and the whole process is realized by machine learning and training. The simulation results show that the hybrid model makes full use of the advantages that BiLSTM can extract local information by using long-distance information and CNN, and it can complete the task of Chinese word segmentation more efficiently and accurately. Under the same experimental conditions, the performance is improved by about 30% compared with similar methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call