Abstract

The task of Chinese word segmentation (CWS) is to segment a continuous Chinese text sequence into individual word sequences according to certain rules. As the most basic step of Chinese natural language processing tasks, the study of CWS is of great significance. In this paper, based on the method of deep learning, the radical features were added when pretraining the character vectors, which has led to the improvement of the outcome. Dilated Convolutional Neural Network (DCNN) was trained and compared with the baseline Bi-LSTM to verify the segmentation ability, and Conditional Random Field (CRF) was applied with two neural network models above-mentioned respectively to confirm the best model. The experiment results show that the DCNN model is better than the widely used Bi-LSTM model in the performance. The joint training method was also applied to further improve the training outcome, and it comes out the improvement on F1 value of joint training with several corpora.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call