Abstract

Universal Dependencies (UD) is becoming a standard annotation scheme cross-linguistically, but it is argued that this scheme centering on content words is harder to parse than the conventional one centering on function words. To improve the parsability of UD, we propose a back-and-forth conversion algorithm, in which we preprocess the training treebank to increase parsability, and reconvert the parser outputs to follow the UD scheme as a postprocess. We show that this technique consistently improves LAS across languages even with a state-of-the-art parser, in particular on core dependency arcs such as nominal modifier. We also provide an in-depth analysis to understand why our method increases parsability.

Highlights

  • IntroductionDifficult than the conventional style centering on function words, e.g., the tree in the lower part of Figure 1 (Schwartz et al, 2012; Ivanova et al, 2013)

  • There are several variations in annotations of dependencies

  • What kinds of errors are reduced by our conversion? To inspect this, we compare F1-scores of each arc label

Read more

Summary

Introduction

Difficult than the conventional style centering on function words, e.g., the tree in the lower part of Figure 1 (Schwartz et al, 2012; Ivanova et al, 2013). To overcome this issue, in this paper, we show the effectiveness of a back-and-forth conversion approach where we train a model and parse sentences in an anontation format with higher parsability, and reconvert the parser output into the UD scheme. Limiting the conversion targets to simpler ones around function words while covering many linguistic phenomena Another limitation in previous work is the parsers: MSTParser or MaltParser is often used, but they are not state-of-the-art today.

Conversion method
Experimental Setting
Result
Findings
Conclusion and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call