Fast(er) Exact Decoding and Global Training for Transition-Based
            Dependency Parsing via a Minimal Feature Set

Lillian Lee,Liang Huang,Tianze Shi

doi:10.18653/v1/d17-1002

Abstract

We first present a minimal feature set for transition-based dependency parsing, continuing a recent trend started by Kiperwasser and Goldberg (2016a) and Cross and Huang (2016a) of using bi-directional LSTM features. We plug our minimal feature set into the dynamic-programming framework of Huang and Sagae (2010) and Kuhlmann et al. (2011) to produce the first implementation of worst-case O(n3) exact decoders for arc-hybrid and arc-eager transition systems. With our minimal features, we also present O(n3) global training methods. Finally, using ensembles including our new parsers, we achieve the best unlabeled attachment score reported (to our knowledge) on the Chinese Treebank and the “second-best-in-class” result on the English Penn Treebank.

Highlights

It used to be the case that the most accurate dependency parsers made global decisions and employed exact decoding
In order to make accurate attachment decisions, historically, transition-based dependency parsers (TBDPs) have required a large set of features in order to access rich information about particular positions in the stack and buffer of the current parser configuration
Consulting many positions means that polynomial-time exact-decoding algorithms do exist, having been introduced by Huang and Sagae (2010) and Kuhlmann et al (2011), they are prohibitively costly in practice, since the number of positions considered can factor into the exponent of the running time

Summary

Introduction

It used to be the case that the most accurate dependency parsers made global decisions and employed exact decoding. Transition-based dependency parsers (TBDPs) have recently achieved state-of-the-art performance, despite the fact that for efficiency reasons, they are usually trained to make local, rather than global, decisions and the decoding process is done approximately, rather than exactly (Weiss et al, 2015; Dyer et al, 2015; Andor et al, 2016). In order to make accurate (local) attachment decisions, historically, TBDPs have required a large set of features in order to access rich information about particular positions in the stack and buffer of the current parser configuration. Huang and Sagae employ a fairly reduced set of nine positions, but the worst-case running time for the exact-decoding version of their algorithm is Opn6q (originally reported as Opn7q) for a length-n sentence. As an extreme case, Dyer et al (2015) use an LSTM to summarize arbitrary information on the stack, which completely rules out dynamic programming

Methods

Results

Conclusion