Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing

Daniel Fried,Dan Klein

doi:10.18653/v1/p18-2075

Abstract

Dynamic oracles provide strong supervision for training constituency parsers with exploration, but must be custom defined for a given parser’s transition system. We explore using a policy gradient method as a parser-agnostic alternative. In addition to directly optimizing for a tree-level metric such as F1, policy gradient has the potential to reduce exposure bias by allowing exploration during training; moreover, it does not require a dynamic oracle for supervision. On four constituency parsers in three languages, the method substantially outperforms static oracle likelihood training in almost all settings. For parsers where a dynamic oracle is available (including a novel oracle which we define for the transition system of Dyer et al., 2016), policy gradient typically recaptures a substantial fraction of the performance gain afforded by the dynamic oracle.

Highlights

IntroductionMany recent state-of-the-art models for constituency parsing are transition based, decomposing production of each parse tree into a sequence of action decisions (Dyer et al, 2016; Cross and Huang, 2016; Liu and Zhang, 2017; Stern et al, 2017), building on a long line of work in transition-based parsing (Nivre, 2003; Yamada and Matsumoto, 2003; Henderson, 2004; Zhang and Clark, 2011; Chen and Manning, 2014; Andor et al, 2016; Kiperwasser and Goldberg, 2016)
We find that while policy gradient usually outperforms standard likelihood training, it typically underperforms the dynamic oracle-based methods – which provide direct, model-aware supervision about which actions are best to take from arbitrary parser states
We investigate four parsers with varying transition systems and methods of encoding the current state and sentence: (1) the discriminative Recurrent Neural Network Grammars (RNNG) parser of Dyer et al (2016), (2) the In-Order parser of Liu and Zhang (2017), (3) the Span-Based parser of Cross and Huang (2016), and (4) the Top-Down parser of Stern et al (2017)

Summary

Introduction

Many recent state-of-the-art models for constituency parsing are transition based, decomposing production of each parse tree into a sequence of action decisions (Dyer et al, 2016; Cross and Huang, 2016; Liu and Zhang, 2017; Stern et al, 2017), building on a long line of work in transition-based parsing (Nivre, 2003; Yamada and Matsumoto, 2003; Henderson, 2004; Zhang and Clark, 2011; Chen and Manning, 2014; Andor et al, 2016; Kiperwasser and Goldberg, 2016) Models of this type, which decompose structure prediction into sequential decisions, can be prone to two issues (Ranzato et al, 2016; Wiseman and Rush, 2016). We obtain new state-of-the-art results for single-model discriminative transition-based parsers trained on the English PTB (92.6 F1), French Treebank (83.5 F1), and Penn Chinese Treebank Version 5.1 (87.0 F1)

Models

Training Procedures

Policy Gradient

Dynamic Oracle Supervision

Experiments

Results and Discussion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2018
Citations: 73	License type: cc-by

Similar Papers

Neural Greedy Constituent Parsing with Dynamic Oracles
Maximin Coavoux ... Benoit Crabbé
-
Maximin Coavoux, et. al.Maximin Coavoux ... Benoit Crabbé
01 Jan 2015
01 Jan 2015

A distributed adaptive policy gradient method based on momentum for multi-agent reinforcement learning
Junru Shi ... Junlong Zhu
Complex & Intelligent Systems | VOL. 10
Junru Shi, et. al.Junru Shi ... Junlong Zhu
12 Jul 2024
Complex & Intelligent Systems | VOL. 10

Reducing Sampling Error in Policy Gradient Learning
...
-
, et. al. ...
08 May 2019
08 May 2019

Analyzing the Effect of Stochastic Transitions in Policy Gradients in Deep Reinforcement Learning
Angelo G Lovatto ... Thiago P Bueno
-
Angelo G Lovatto, et. al.Angelo G Lovatto ... Thiago P Bueno
01 Oct 2019
01 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing

Abstract

Highlights

Summary

Talk to us

Similar Papers