Perturbation Based Learning for Structured NLP Tasks with Application to Dependency Parsing

Amichay Doitch,Ram Yazdi,Roi Reichart,Tamir Hazan

doi:10.1162/tacl_a_00291

Amichay Doitch, Ram Yazdi + Show 2 more

Open Access

PDF Available

https://doi.org/10.1162/tacl_a_00291

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The best solution of structured prediction models in NLP is often inaccurate because of limited expressive power of the model or to non-exact parameter estimation. One way to mitigate this problem is sampling candidate solutions from the model’s solution space, reasoning that effective exploration of this space should yield high-quality solutions. Unfortunately, sampling is often computationally hard and many works hence back-off to sub-optimal strategies, such as extraction of the best scoring solutions of the model, which are not as diverse as sampled solutions. In this paper we propose a perturbation-based approach where sampling from a probabilistic model is computationally efficient. We present a learning algorithm for the variance of the perturbations, and empirically demonstrate its importance. Moreover, while finding the argmax in our model is intractable, we propose an efficient and effective approximation. We apply our framework to cross-lingual dependency parsing across 72 corpora from 42 languages and to lightly supervised dependency parsing across 13 corpora from 12 languages, and demonstrate strong results in terms of both the quality of the entire solution list and of the final solution.1

Highlights

Structured prediction problems are ubiquitous in Natural Language Processing (NLP) (Smith, 2011)
We presented a perturbation-based framework for structured prediction in NLP
Our algorithmic contribution includes an algorithm for data-driven estimation of the perturbation variance and a max over marginals (MOM) algorithm for distilling a final solution from the K-list

Summary

Introduction

Structured prediction problems are ubiquitous in Natural Language Processing (NLP) (Smith, 2011). In most cases models for such problems are designed to predict the highest quality structure of the input example (e.g., a sentence or a document), in many cases a diverse list of meaningful structures is of fundamental importance. It can be a defining property of the task. In extractive summarization (Nenkova and McKeown, 2011) good summaries are those that consist of a high quality and diverse list of sentences extracted from the text. Dependency forests were used in order to improve machine translation (Tu et al, 2010; Ma et al, 2018) and sentiment analysis (Tu et al, 2012)

Objectives

Results

Conclusion