Neural Joint Model for Transition-based Chinese Syntactic Analysis

Shuhei Kurita,Sadao Kurohashi,Daisuke Kawahara

doi:10.18653/v1/p17-1111

Abstract

We present neural network-based joint models for Chinese word segmentation, POS tagging and dependency parsing. Our models are the first neural approaches for fully joint Chinese analysis that is known to prevent the error propagation problem of pipeline models. Although word embeddings play a key role in dependency parsing, they cannot be applied directly to the joint task in the previous work. To address this problem, we propose embeddings of character strings, in addition to words. Experiments show that our models outperform existing systems in Chinese word segmentation and POS tagging, and perform preferable accuracies in dependency parsing. We also explore bi-LSTM models with fewer features.

Highlights

Dependency parsers have been enhanced by the use of neural networks and embedding vectors (Chen and Manning, 2014; Weiss et al, 2015; Zhou et al, 2015; Alberti et al, 2015; Andor et al, 2016; Dyer et al, 2015)
Our contributions are summarized as follows: (1) we propose the first embedding-based fully joint parsing model, (2) we use character string embeddings for unknown words (UNK) and incomplete tokens. (3) we explore bidirectional LSTM (bi-LSTM) models to avoid the detailed feature engineering in previous approaches. (4) in experiments using Chinese corpus, we achieve state-of-the-art scores in word segmentation, POS tagging and dependency parsing
Our segmentation and POS tagging model (SegTag) joint model is superior to these previous models, including Hatori et al (2012)’s model with rich dictionary information, in terms of both segmentation and POS tagging accuracy

Summary

Introduction

Dependency parsers have been enhanced by the use of neural networks and embedding vectors (Chen and Manning, 2014; Weiss et al, 2015; Zhou et al, 2015; Alberti et al, 2015; Andor et al, 2016; Dyer et al, 2015). When these dependency parsers process sentences in English and other languages that use symbols for word separations, they can be very accurate. Pipeline models achieve dependency scores of around 80% for Chinese

Methods

Results

Conclusion