Abstract

We first present a minimal feature set for transition-based dependency parsing, continuing a recent trend started by Kiperwasser and Goldberg (2016a) and Cross and Huang (2016a) of using bi-directional LSTM features. We plug our minimal feature set into the dynamic-programming framework of Huang and Sagae (2010) and Kuhlmann et al. (2011) to produce the first implementation of worst-case O(n3) exact decoders for arc-hybrid and arc-eager transition systems. With our minimal features, we also present O(n3) global training methods. Finally, using ensembles including our new parsers, we achieve the best unlabeled attachment score reported (to our knowledge) on the Chinese Treebank and the “second-best-in-class” result on the English Penn Treebank.

Highlights

  • It used to be the case that the most accurate dependency parsers made global decisions and employed exact decoding

  • In order to make accurate attachment decisions, historically, transition-based dependency parsers (TBDPs) have required a large set of features in order to access rich information about particular positions in the stack and buffer of the current parser configuration

  • Consulting many positions means that polynomial-time exact-decoding algorithms do exist, having been introduced by Huang and Sagae (2010) and Kuhlmann et al (2011), they are prohibitively costly in practice, since the number of positions considered can factor into the exponent of the running time

Read more

Summary

Introduction

It used to be the case that the most accurate dependency parsers made global decisions and employed exact decoding. Transition-based dependency parsers (TBDPs) have recently achieved state-of-the-art performance, despite the fact that for efficiency reasons, they are usually trained to make local, rather than global, decisions and the decoding process is done approximately, rather than exactly (Weiss et al, 2015; Dyer et al, 2015; Andor et al, 2016). In order to make accurate (local) attachment decisions, historically, TBDPs have required a large set of features in order to access rich information about particular positions in the stack and buffer of the current parser configuration. Huang and Sagae employ a fairly reduced set of nine positions, but the worst-case running time for the exact-decoding version of their algorithm is Opn6q (originally reported as Opn7q) for a length-n sentence. As an extreme case, Dyer et al (2015) use an LSTM to summarize arbitrary information on the stack, which completely rules out dynamic programming

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call