Abstract

Conditional random fields (CRFs) are among the classic models for Chinese word segmentation (CWS). Deep neural networks (DNNs) have recently emerged as a research hotspot in natural language processing (NLP). However, studies exploring the use of DNN for CWS have not yielded significant gains over CRF models. Thus, developing CRFs for CWS remains a viable avenue for research. This paper proposes two methods to enhance CRF-based CWS. First, a rapid and effective sequential forward selection (SFS)-style method is utilized for feature template selection to balance search performance with search speed. Second, it describes a method for character normalization more robust than the traditional method. Incremental evaluations on the second SIGHAN bakeoff show that the two proposed methods reduce the error by 7.8%, and 10.6% respectively in terms of F-score. The final system achieved an F-score of 0.955 (AS), 0.955 (CITYU), 0.970 (MSR), and 0.952 (PKU), which is comparable to those of the best systems reported in the reference.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call