Multi-lingual Dependency Parsing Evaluation: a Large-scale Analysis of Word Order Properties using Artificial Data

Kristina Gulordava,Paola Merlo

doi:10.1162/tacl_a_00103

Kristina Gulordava, Paola Merlo

Open Access

https://doi.org/10.1162/tacl_a_00103

Copy DOI

Abstract

The growing work in multi-lingual parsing faces the challenge of fair comparative evaluation and performance analysis across languages and their treebanks. The difficulty lies in teasing apart the properties of treebanks, such as their size or average sentence length, from those of the annotation scheme, and from the linguistic properties of languages. We propose a method to evaluate the effects of word order of a language on dependency parsing performance, while controlling for confounding treebank properties. The method uses artificially-generated treebanks that are minimal permutations of actual treebanks with respect to two word order properties: word order variation and dependency lengths. Based on these artificial data on twelve languages, we show that longer dependencies and higher word order variability degrade parsing performance. Our method also extends to minimal pairs of individual sentences, leading to a finer-grained understanding of parsing errors.

Highlights

Fair comparative performance evaluation across languages and their treebanks is one of the difficulties for work on multi-lingual parsing (Buchholz and Marsi, 2006; Nivre et al, 2007; Seddah et al, 2011)
In a set of pairwise comparisons between original and permuted treebanks, we confirm the influence of word order variability and dependency length on parsing performance, at the large scale provided by fourteen different treebanks across twelve different languages
The graph-based architecture is known to be less dependent on word order and dependency length than transition-based dependency parsers, as it searches the whole space of possible parse trees and solves a global optimisation problem (McDonald and Nivre, 2011)

Summary

Introduction

Fair comparative performance evaluation across languages and their treebanks is one of the difficulties for work on multi-lingual parsing (Buchholz and Marsi, 2006; Nivre et al, 2007; Seddah et al, 2011). We compare how the parsing performances on the original and the permuted trees vary in relation to the quantified measures of the dependency length and word order variation properties of the treebanks. Morphologically-rich languages are known to be hard for parsing, as rich morphology increases the percentage of new words in the test set (Nivre et al, 2007; Tsarfaty et al, 2010) These languages often exhibit very flexible word order. In a set of pairwise comparisons between original and permuted treebanks, we confirm the influence of word order variability and dependency length on parsing performance, at the large scale provided by fourteen different treebanks across twelve different languages.. On an example of one treebank, we show how our method can be extended to provide finer-grained analyses at the sentence level and relate the parsing errors to properties of the parsing architecture

Methodology

Word order properties

Creating trees with optimal DL

Creating trees with optimal Entropy

Dependency Treebanks

Word order properties of original and permuted treebanks

Parsing setup

Comparison of parsing performance between original and permuted treebanks

Sentence-level analysis of parsing performance

General discussion

Related work

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2016
Citations: 42	License type: cc-by

R Discovery Prime

R Discovery Prime

Multi-lingual Dependency Parsing Evaluation: a Large-scale Analysis of Word Order Properties using Artificial Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Sanajärjestyksen variaatiosta suomenoppijoiden teksteissä
Mikko Kajander
Lähivõrdlusi. Lähivertailuja | VOL. 26
Mikko KajanderMikko Kajander
31 Oct 2016
Lähivõrdlusi. Lähivertailuja | VOL. 26

Sanajärjestyksen variaatiosta suomenoppijoiden teksteissä

Lähivõrdlusi. Lähivertailuja | VOL. 26

31 Oct 2016
Lähivõrdlusi. Lähivertailuja | VOL. 26

Sign Order in Argentine Sign Language
Monica Curiel ... Maria Ignacia Massone
Sign Language Studies | VOL. 5
Monica Curiel, et. al.Monica Curiel ... Maria Ignacia Massone
01 Sep 2004
Sign Language Studies | VOL. 5

Aspects of word order in Russian
Elena Dmitrievna Kallestinova
-
Elena Dmitrievna KallestinovaElena Dmitrievna Kallestinova
21 Jan 2009
21 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-lingual Dependency Parsing Evaluation: a Large-scale Analysis of Word Order Properties using Artificial Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics