Abstract

Most of the unsupervised dependency parsers are based on first-order probabilistic generative models that only consider local parent-child information. Inspired by second-order supervised dependency parsing, we proposed a second-order extension of unsupervised neural dependency models that incorporate grandparent-child or sibling information. We also propose a novel design of the neural parameterization and optimization methods of the dependency models. In second-order models, the number of grammar rules grows cubically with the increase of vocabulary size, making it difficult to train lexicalized models that may contain thousands of words. To circumvent this problem while still benefiting from both second-order parsing and lexicalization, we use the agreement-based learning framework to jointly train a second-order unlexicalized model and a first-order lexicalized model. Experiments on multiple datasets show the effectiveness of our second-order models compared with recent state-of-the-art methods. Our joint model achieves a 10% improvement over the previous state-of-the-art parser on the full WSJ test set.

Highlights

  • Dependency parsing is a classical task in natural language processing

  • The experimental results demonstrate that our models achieve state-ofthe-art accuracies on unsupervised dependency parsing

  • There are three types of probabilistic grammar rules in a Dependency Model with Valence (DMV), namely ROOT, CHILD and DECISION rules, each associated with a set of multinomial distributions PROOT(c), PCHILD(c|p, dir, val) and PDECISION(dec|p, dir, val), where p is the parent token, c is the child token, dec is the continue/stop decision, dir indicates the direction of generation, and val indicates whether parent p has generated any child in direction dir

Read more

Summary

Introduction

Dependency parsing is a classical task in natural language processing. The head-dependent relations produced by dependency parsing can provide an approximation to the semantic relationship between words, which is useful in many downstream NLP tasks such as machine translation, information extraction and question answering. Most methods in the literature of unsupervised dependency parsing are based on the Dependency Model with Valence (DMV) (Klein and Manning, 2004), which is a probabilistic generative model. Researchers often turn to discriminative methods, which can incorporate more contextual information into the scoring or prediction of dependency arcs. For DMV, Han et al (2019) proposes the discriminative neural DMV which uses a global sentence embedding to introduce contextual information into the calculation of grammar rule probabilities. In the literature of supervised graph-based dependency parsing, there exists another technique for incorporating contextual information. One particular challenge faced by second-order neural DMVs is that the number of grammar rules grows cubically to the vocabulary size, making it difficult to store and train a lexicalized model containing thousands of words. The experimental results demonstrate that our models achieve state-ofthe-art accuracies on unsupervised dependency parsing

Dependency Model With Valence
Neuralized DMV Models
Second-Order Parsing
Parameterization
Agreement-Based Learning
Datasets and Setting
Result
METHODS
Comparison of Training Methods
Effect of Joint Training and Parsing
Limitations
Conclusion
Findings
Inside Algorithm and Parsing Algorithm
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call