Abstract

Chinese word segmentation and dependency parsing are two fundamental tasks for Chinese natural language processing. The dependency parsing is defined at the word-level. Therefore word segmentation is the precondition of dependency parsing, which makes dependency parsing suffer from error propagation and unable to directly make use of character-level pre-trained language models (such as BERT). In this paper, we propose a graph-based model to integrate Chinese word segmentation and dependency parsing. Different from previous transition-based joint models, our proposed model is more concise, which results in fewer efforts of feature engineering. Our graph-based joint model achieves better performance than previous joint models and state-of-the-art results in both Chinese word segmentation and dependency parsing. Additionally, when BERT is combined, our model can substantially reduce the performance gap of dependency parsing between joint models and gold-segmented word-based models. Our code is publicly available at https://github.com/fastnlp/JointCwsParser

Highlights

  • Unlike English, Chinese sentences consist of continuous characters and lack obvious boundaries between Chinese words

  • Words are usually regarded as the minimum semantic unit, Chinese word segmentation (CWS) becomes a preliminary pre-process step for downstream Chinese natural language processing (NLP)

  • Compared with the previous transition-based joint models, our proposed model is a graphbased model, which results in fewer efforts of feature engineering

Read more

Summary

Introduction

Unlike English, Chinese sentences consist of continuous characters and lack obvious boundaries between Chinese words. These three tasks (word segmentation, POS tagging, and dependency parsing) are strongly related. A traditional solution to this error propagation problem is to use joint models (Hatori et al, 2012; Zhang et al, 2014; Kurita et al, 2017) These previous joint models mainly adopted a transitionbased parsing framework to integrate the word segmentation, POS tagging, and dependency parsing. Based on standard sequential shift-reduce transitions, they design some extra actions for word segmentation and POS tagging These joint models achieved better performance than the pipeline model, they still suffer from two limitations:. We propose a joint model for CWS and dependency parsing that integrates these two tasks into a unified graph-based parsing framework. By using the our proposed model, we can exploit BERT to implement CWS and dependency parsing jointly

Related Work
Joint Segmentation and POS Tagging
Joint POS Tagging and Dependency Parsing
Proposed Model
Encoding Layer
BiLSTM-based Encoding Layer
BERT-based Encoding Layer
Biaffine Layer
Unlabeled Arc Prediction
Arc Label Prediction
Measures
Models for Word Segmentation Only
Datasets
Experimental Settings
Proposed Models
Comparison with the Previous Joint Models
Chinese Word Segmentation
Comparison with the Pipeline Model
Ablation Study
Error Analysis
Findings
Conclusion and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call