Abstract Recently, deep learning methods have achieved remarkable success in the Chinese word segmentation (CWS) task. Some of them enhance the CWS model by utilizing contextual features and external resources (e.g., sub-words, lexicon, and syntax). However, existing approaches fail to fully use the heterogeneous features and their structural information. Therefore, in this paper, we propose a heterogeneous information learning framework for CWS, named heterogeneous graph neural segmenter (HGNSeg), which exploits heterogeneous features with the graph convolutional networks and the pretrained language model. Experimental results on six benchmark datasets (e.g., SIGHAN 2005 and SIGHAN 2008) confirm that HGNSeg can effectively improve the performance of CWS. Importantly, HGNSeg also demonstrates an excellent ability to alleviate the out-of-vocabulary (OOV) issue in cross-domain scenarios.
Read full abstract