Abstract
Universal Dependencies (UD) is becoming a standard annotation scheme cross-linguistically, but it is argued that this scheme centering on content words is harder to parse than the conventional one centering on function words. To improve the parsability of UD, we propose a back-and-forth conversion algorithm, in which we preprocess the training treebank to increase parsability, and reconvert the parser outputs to follow the UD scheme as a postprocess. We show that this technique consistently improves LAS across languages even with a state-of-the-art parser, in particular on core dependency arcs such as nominal modifier. We also provide an in-depth analysis to understand why our method increases parsability.
Highlights
IntroductionDifficult than the conventional style centering on function words, e.g., the tree in the lower part of Figure 1 (Schwartz et al, 2012; Ivanova et al, 2013)
There are several variations in annotations of dependencies
What kinds of errors are reduced by our conversion? To inspect this, we compare F1-scores of each arc label
Summary
Difficult than the conventional style centering on function words, e.g., the tree in the lower part of Figure 1 (Schwartz et al, 2012; Ivanova et al, 2013). To overcome this issue, in this paper, we show the effectiveness of a back-and-forth conversion approach where we train a model and parse sentences in an anontation format with higher parsability, and reconvert the parser output into the UD scheme. Limiting the conversion targets to simpler ones around function words while covering many linguistic phenomena Another limitation in previous work is the parsers: MSTParser or MaltParser is often used, but they are not state-of-the-art today.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have