Lexicon‐Augmented Cross‐Domain Chinese Word Segmentation with Graph Convolutional Network

Hao Yu,Yu Wang,Kaiyu Huang,Degen Huang

doi:10.1049/cje.2021.00.363

Abstract

Existing neural approaches have achieved significant progress for Chinese word segmentation (CWS). The performances of these methods tend to drop dramatically in the cross-domain scenarios due to the data distribution mismatch across domains and the out of vocabulary words problem. To address these two issues, proposes a lexicon-augmented graph convolutional network for cross-domain CWS. The novel model can capture the information of word boundaries from all candidate words and utilize domain lexicons to alleviate the distribution gap across domains. Experimental results on the cross-domain CWS datasets (SIGHAN-2010 and TCM) show that the proposed method successfully models information of domain lexicons for neural CWS approaches and helps to achieve competitive performance for cross-domain CWS. The two problems of cross-domain CWS can be effectively solved through various interactions between characters and candidate words based on graphs. Further, experiments on the CWS benchmarks (Bakeoff-2005) also demonstrate the robustness and efficiency of the proposed method.

Full Text