Domain-Aware Word Segmentation for Chinese Language: A Document-Level Context-Aware Model

Kaiyu Huang,Zhuang Liu,Fengran Mo,Keli Xiao,Bo Jin,Degen Huang

doi:10.1145/3481298

Abstract

Word segmentation is an essential and challenging task in natural language processing, especially for the Chinese language due to its high linguistic complexity. Existing methods for Chinese word segmentation, including statistical machine learning methods and neural network methods, usually have good performance in specific knowledge domains. Given the increasing importance of interdisciplinary and cross-domain studies, one of the challenges in cross-domain word segmentation is to handle the out-of-vocabulary (OOV) words. Existing methods show unsatisfactory performance to meet the practical standard. To this end, we propose a document-level context-aware model that can automatically perceive and identify OOV words from different domains. Our method jointly implements a word-based and a character-based model and then processes the results with a newly proposed reconstruction model. We evaluate the new method by designing and conducting comprehensive experiments on two real-world datasets (e.g., news from different domains). The results demonstrate the superiority of our method over the state-of-the-art models in handling texts from different domains. Importantly, when doing the word segmentation under the cross-domain scenario, our proposed method can improve the performance of OOV words recognition.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Domain-Aware Word Segmentation for Chinese Language: A Document-Level Context-Aware Model

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Nov 3, 2021
Citations: 1

Similar Papers

Global discriminative model for dependency parsing in NLP pipeline
Miao Li ... Hongyi Ding
-
Miao Li, et. al.Miao Li ... Hongyi Ding
01 Sep 2014
01 Sep 2014

New Words Discovery Method Based On Word Segmentation Result
Heyang Liu ... Pengdong Gao
-
Heyang Liu, et. al.Heyang Liu ... Pengdong Gao
01 Jun 2018
01 Jun 2018

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

Lexicon-Based Graph Convolutional Network for Chinese Word Segmentation

-

23 Oct 2021
23 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Domain-Aware Word Segmentation for Chinese Language: A Document-Level Context-Aware Model

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing