Comparing learnability of two dependency schemes: ‘semantic’ (UD) and ‘syntactic’ (SUD)

Ryszard Tuora,Aleksander Leczkowski,Adam Przepiórkowski

doi:10.18653/v1/2021.findings-emnlp.256

Abstract

This paper contributes to the thread of research on the learnability of different dependency annotation schemes: one (‘semantic’) favouring content words as heads of dependency relations and the other (‘syntactic’) favouring syntactic heads. Several studies have lent support to the idea that choosing syntactic criteria for assigning heads in dependency trees improves the performance of dependency parsers. This may be explained by postulating that syntactic approaches are generally more learnable. In this study, we test this hypothesis by comparing the performance of five parsing systems (both transition- and graph-based) on a selection of 21 treebanks, each in a ‘semantic’ variant, represented by standard UD (Universal Dependencies), and a ‘syntactic’ variant, represented by SUD (Surface-syntactic Universal Dependencies): unlike previously reported experiments, which considered learnability of ‘semantic’ and ‘syntactic’ annotations of particular constructions in vitro, the experiments reported here consider whole annotation schemes in vivo. Additionally, we compare these annotation schemes using a range of quantitative syntactic properties, which may also reflect their learnability. The results of the experiments show that SUD tends to be more learnable than UD, but the advantage of one or the other scheme depends on the parser and the corpus in question.

Highlights

Introduction and BackgroundThis paper compares the learnability of two approaches to dependency annotation
The English GUM treebank has 48 and 43 different labels in UD and Syntactic Universal Dependencies (SUD) respectively, and applying the label processing described in §2.3 results in further reduction to 36 and 25 different labels, respectively, making the task of parsers easier in the case of SUD than in the case of UD. (These predictions are confirmed by the results concerning Label Entropy, reported in §3.2 below.) the results reported cannot at this stage be interpreted as showing – but are compatible with the claim – that ‘functional headedness’ tends to be more learnable than ‘content headedness’; further experiments are needed to confirm or deny such a claim
More extensive experiments, performed on a number of languages, taking into account a handful of constructions and a few parsers, such as those reported in Rehbein et al (2017), showed that this relation between learnability and different approaches to headedness, even though imperfect, in general favours syntactic-like approaches, and suggested a more stable correlation between learnability and other corpus characteristics

Summary

Introduction and Background

This paper compares the learnability of two approaches to dependency annotation. One, represented by Universal Dependencies The experiments involved five dif- small improvement observed in the case of the tranferent parsers (representing both transition-based sition based MaltParser, but not with the graphand graph-based methodologies) and two different based MSTParser), Rosa (2015) (adposition–noun learnability measures (including one based on at- constructions in 30 languages), Kohita et al (2017). Of infinitivals introduced by to, but in both cases report on an experiment involving 24 languages, having the main lexical verb as the dependent – as in which the original UD representation of verb in SUD, but unlike in UD – gives generally better groups (modal–verb constructions) turns out to be results. In order to obtain robust results, only relatively large corpora, over

Parsers

Evaluation

UAS and LAS scores

Quantitative syntactic properties

Conclusions