Global Transition-based Non-projective Dependency Parsing

Carlos Gómez-Rodríguez,Lillian Lee,Tianze Shi

doi:10.18653/v1/p18-1248

Abstract

Shi, Huang, and Lee (2017a) obtained state-of-the-art results for English and Chinese dependency parsing by combining dynamic-programming implementations of transition-based dependency parsers with a minimal set of bidirectional LSTM features. However, their results were limited to projective parsing. In this paper, we extend their approach to support non-projectivity by providing the first practical implementation of the MH₄ algorithm, an O(n4) mildly nonprojective dynamic-programming parser with very high coverage on non-projective treebanks. To make MH₄ compatible with minimal transition-based feature sets, we introduce a transition-based interpretation of it in which parser items are mapped to sequences of transitions. We thus obtain the first implementation of global decoding for non-projective transition-based parsing, and demonstrate empirically that it is effective than its projective counterpart in parsing a number of highly non-projective languages.

Highlights

Transition-based dependency parsers are a popular approach to natural language parsing, as they achieve good results in terms of accuracy and efficiency (Yamada and Matsumoto, 2003; Nivre and Scholz, 2004; Zhang and Nivre, 2011; Chen and Manning, 2014; Dyer et al, 2015; Andor et al, 2016; Kiperwasser and Goldberg, 2016)
While cubic-time exact inference algorithms for several well-known projective transition systems had been known since the work of Huang and Sagae (2010) and Kuhlmann et al (2011), they had been considered of theoretical interest only due to their incompatibility with rich feature models: incorporation of complex features resulted in jumps in asymptotic runtime complexity to impractical levels
Data and Evaluation We experiment with the Universal Dependencies (UD) 2.0 dataset used for the CoNLL 2017 shared task (Zeman et al, 2017)

Summary

Introduction

Transition-based dependency parsers are a popular approach to natural language parsing, as they achieve good results in terms of accuracy and efficiency (Yamada and Matsumoto, 2003; Nivre and Scholz, 2004; Zhang and Nivre, 2011; Chen and Manning, 2014; Dyer et al, 2015; Andor et al, 2016; Kiperwasser and Goldberg, 2016). The recent popularization of bidirectional long-short term memory networks (biLSTMs; Hochreiter and Schmidhuber, 1997) to derive feature representations for parsing, given their capacity to capture long-range information, has demonstrated that one may not need to use complex feature models to obtain good accuracy (Kiperwasser and Goldberg, 2016; Cross and Huang, 2016) In this context, Shi et al (2017a) presented an implementation of the exact inference algorithms of Kuhlmann et al (2011) with a minimal set of only two bi-LSTM-based feature vectors. This kept the complexity cubic, and obtained state-of-the-art results in English and Chinese parsing

Methods

Results

Conclusion