Abstract

Abstract We give a general framework for inference in spanning tree models. We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models. Our algorithms exploit a fundamental connection between gradients and expectations, which allows us to derive efficient algorithms. These algorithms are easy to implement with or without automatic differentiation software. We motivate the development of our framework with several cautionary tales of previous research, which has developed numerous inefficient algorithms for computing expectations and their gradients. We demonstrate how our framework efficiently computes several quantities with known algorithms, including the expected attachment score, entropy, and generalized expectation criteria. As a bonus, we give algorithms for quantities that are missing in the literature, including the KL divergence. In all cases, our approach matches the efficiency of existing algorithms and, in several cases, reduces the runtime complexity by a factor of the sentence length. We validate the implementation of our framework through runtime experiments. We find our algorithms are up to 15 and 9 times faster than previous algorithms for computing the Shannon entropy and the gradient of the generalized expectation objective, respectively.

Highlights

  • Dependency trees are a fundamental combinatorial structure in natural language processing

  • We focus on edge-factored models where the probability of a dependency tree is proportional to the product the weights of its edges

  • We show that the gradient of the generalized expectation (GE) criterion can be evaluated in O N 3

Read more

Summary

Introduction

Dependency trees are a fundamental combinatorial structure in natural language processing. The result is still used in more recent work (Ma and Hovy, 2017; Liu and Lapata, 2018) We build upon this tradition through a framework for computing expectations of a rich family of functions under a distribution over trees. Our framework is motivated by the lack of a unified approach for computing expectations over spanning trees in the literature. We believe this gap has resulted in the publication of numerous inefficient algorithms. We have released a reference implementation at the following URL: https://github .com/rycolab/tree expectations

Distributions over Trees
The Matrix–Tree Theorem
Dependency parsing and the Laplacian Zoo
Expectations
Connecting Gradients and Expectations
Algorithms
Derivatives of Z
Complexity Analysis
Applications and Prior Work
Shannon Entropy
Kullback–Leibler Divergence
Gradient of the GE Objective
Conclusion
B Proof of T2
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.