Abstract

We introduce a novel transition system for discontinuous constituency parsing. Instead of storing subtrees in a stack –i.e. a data structure with linear-time sequential access– the proposed system uses a set of parsing items, with constant-time random access. This change makes it possible to construct any discontinuous constituency tree in exactly 4n–2 transitions for a sentence of length n. At each parsing step, the parser considers every item in the set to be combined with a focus item and to construct a new constituent in a bottom-up fashion. The parsing strategy is based on the assumption that most syntactic structures can be parsed incrementally and that the set –the memory of the parser– remains reasonably small on average. Moreover, we introduce a provably correct dynamic oracle for the new transition system, and present the first experiments in discontinuous constituency parsing using a dynamic oracle. Our parser obtains state-of-the-art results on three English and German discontinuous treebanks.

Highlights

  • Transitionbased discontinuous parsers construct discontinuous constituents by reordering terminals with the swapping them with a dedicated action (SWAP) action (Versley, 2014a,b; Maier, 2015; Maier and Lichte, 2016; Stanojevicand Garrido Alhama, 2017), or by using a split stack and the GAP action to combine two non-adjacent constituents (Coavoux and Crabbe, 2017a; Coavoux et al, 2019). These proposals represent the memory of the parser with data structures with linear-time sequential access

  • The effect of the oracle is in line with other published results in projective constituency parsing (Coavoux and Crabbe, 2016; Cross and Huang, 2016b) and dependency parsing (Goldberg and Nivre, 2012; Gomez-Rodrıguez et al, 2014): the dynamic oracle improves the generalization capability of the parser

  • We have presented a novel transition system that dispenses with the use of a stack, i.e. a memory with linear sequential access

Read more

Summary

Introduction

Transitionbased discontinuous parsers construct discontinuous constituents by reordering terminals with the SWAP action (Versley, 2014a,b; Maier, 2015; Maier and Lichte, 2016; Stanojevicand Garrido Alhama, 2017), or by using a split stack and the GAP action to combine two non-adjacent constituents (Coavoux and Crabbe, 2017a; Coavoux et al, 2019). These proposals represent the memory of the parser (i.e. the tree fragments being constructed) with data structures with linear-time sequential access (either a stack, or a stack coupled with a double-ended queue). The code of our parser is released as an opensource project at https://gitlab.com/ mcoavoux/discoparset

Set-based Transition System
System Description
Oracles
Static Oracle
Dynamic Oracle
A Neural Network based on Constituent Boundaries
Token Representations
Set Representations
Action Scorer
POS Tagger
Objective Function
Experiments
Datasets
Implementation and Protocol
Results
Efficiency
Related Work
Conclusion
A Oracle Correctness
B Hyperparameters
C Detailed Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call