Abstract

We propose the Recursive Non-autoregressive Graph-to-Graph Transformer architecture (RNGTr) for the iterative refinement of arbitrary graphs through the recursive application of a non-autoregressive Graph-to-Graph Transformer and apply it to syntactic dependency parsing. We demonstrate the power and effectiveness of RNGTr on several dependency corpora, using a refinement model pre-trained with BERT. We also introduce Syntactic Transformer (SynTr), a non-recursive parser similar to our refinement model. RNGTr can improve the accuracy of a variety of initial parsers on 13 languages from the Universal Dependencies Treebanks, English and Chinese Penn Treebanks, and the German CoNLL2009 corpus, even improving over the new state-of-the-art results achieved by SynTr, significantly improving the state-of-the-art for all corpora tested.

Highlights

  • Self-attention models, such as Transformer (Vaswani et al, 2017), have been hugely successful in a wide range of natural language processing (NLP) tasks, especially when combined with language-model pre-training, such as BERT (Devlin et al, 2019)

  • After some initial experiments to determine the maximum number of refinement iterations, we report the performance of the RNG Transformer model on the Universal Dependency (UD) Treebanks, Penn Treebanks, and German CoNLL 2009 Treebank

  • The Syntactic Transformer (SynTr) model significantly outperforms the UDify model, so the errors are harder to correct by adding the RNGTr model (2.67% for SynTr versus 15.01% for UDify of relative error reduction in LAS after integration)

Read more

Summary

Introduction

Self-attention models, such as Transformer (Vaswani et al, 2017), have been hugely successful in a wide range of natural language processing (NLP) tasks, especially when combined with language-model pre-training, such as BERT (Devlin et al, 2019). These architectures contain a stack of self-attention layers that can capture long-range dependencies over the input sequence, while still representing its sequential order using absolute position encodings. This parsing model predicts one edge of the parse graph at a time, conditioning on the graph of previous edges, so it is an autoregressive model

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call