Abstract

The article examines the Universal Dependencies (UD) annotation scheme. The UD project is an international initiative to produce treebanks of the world’s languages, whereby the treebanks have been annotated in a cross-linguistically consistent manner. A central aspect of the UD annotation scheme is its analysis of function words. The scheme advocates subordinating function words to content words. This article discusses linguistic and practical motivations behind the UD decision to subordinate function words to content words. It demonstrates that UD choices in this area are not supported linguistically. At the same time, the near convertibility of the UD treebanks to a more linguistically motivated annotation format means that the UD initiative remains of great value to linguistics in general.

Highlights

  • The Universal Dependencies (UD) project is a large-scale effort involving many dozens of researchers internationally to produce consistently annotated treebanks of the world’s languages (UD webpage: http://universaldependencies.org/).1 The consistency of annotation occurs in the form of adherence to the same one annotation scheme

  • It emphasizes that the potential for automated conversion of the UD corpora to an annotation format that is linguistically well-motivated means that the UD project is of great value to linguistics in general

  • In the crosslinguistic big picture, an annotation scheme that results in lower mean dependency distance (MDD) numbers is linguistically more plausible – other things being equal – since it is more consistent with the human tendency to reduce linguistic complexity in the interest of easing the burden on working memory

Read more

Summary

Introduction

The Universal Dependencies (UD) project is a large-scale effort involving many dozens of researchers internationally to produce consistently annotated treebanks of the world’s languages (UD webpage: http://universaldependencies.org/). The consistency of annotation occurs in the form of adherence to the same one annotation scheme. The purely syntactic analysis, in contrast, is more plausible because it has subcategorization pointing down the hierarchy, from the head function verb has to the dependent content verb eaten. The result of these choices is a situation in which the content verb in the subordinate clause becomes the root of the entire sentence, as shown with tried in (9a) This result is quite implausible as there are two competing subjects, problem and this, a fact that the authors of UD have realized, since they reject the analysis in (9a) and adopt the one in (9b).

Syntax over semantics
Structural parallelism
Head-dependent ordering
Converting to purely syntactic annotation
Findings
Concluding comments
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call