Integrating Graph-Based and Transition-Based Dependency Parsers in the Deep Contextualized Era

Agnieszka Falenska,Jonas Kuhn,Anders Björkelund

doi:10.18653/v1/2020.iwpt-1.4

Agnieszka Falenska, Jonas Kuhn + Show 1 more

Open Access

https://doi.org/10.18653/v1/2020.iwpt-1.4

Copy DOI

Publication Date: Jan 1, 2020
Citations: 46	License type: cc-by

Affiliation: University of Stuttgart, Lund University

Abstract

Graph-based and transition-based dependency parsers used to have different strengths and weaknesses. Therefore, combining the outputs of parsers from both paradigms used to be the standard approach to improve or analyze their performance. However, with the recent adoption of deep contextualized word representations, the chief weakness of graph-based models, i.e., their limited scope of features, has been mitigated. Through two popular combination techniques – blending and stacking – we demonstrate that the remaining diversity in the parsing models is reduced below the level of models trained with different random seeds. Thus, an integration no longer leads to increased accuracy. When both parsers depend on BiLSTMs, the graph-based architecture has a consistent advantage. This advantage stems from globally-trained BiLSTM representations, which capture more distant look-ahead syntactic relations. Such representations can be exploited through multi-task learning, which improves the transition-based parser, especially on treebanks with a high ratio of right-headed dependencies.

Highlights

Dependency parsers can roughly be divided into two classes: graph-based (Eisner, 1996; McDonald et al, 2005) and transition-based (Yamada and Matsumoto, 2003; Nivre, 2003)
On average, the differences between BiLSTM-based graph-based and transition-based models are reduced below the level of different random seeds
In experiments where GB is trained without BiLSTMs, we extend the feature set with surface features known from classic graph-based parsers, such as distance between head and dependent, and words at the distance of 1 and 2 from heads and dependents (McDonald et al, 2005)

Summary

Introduction

Dependency parsers can roughly be divided into two classes: graph-based (Eisner, 1996; McDonald et al, 2005) and transition-based (Yamada and Matsumoto, 2003; Nivre, 2003). One of the most significant recent developments in dependency parsing is based on encoding rich sentential context into word representations, such as BiLSTM vectors (Hochreiter and Schmidhuber, 1997; Graves and Schmidhuber, 2005) and deep contextualized word embeddings (Peters et al, 2018; Devlin et al, 2019) Including these representations as features has set a new state of the art for both graph-based and transition-based parsers (Kiperwasser and Goldberg, 2016; Che et al, 2018). BiLSTMs into dependency parsers had another consequence, i.e., it enabled the use of exact search algorithms for transitionbased parsers (Shi et al, 2017a; Gomez-Rodrıguez et al, 2018) It is an interesting question if the error profiles of such parsers are even less distinguishable from the graph-based outputs.

Methods

Results

Conclusion