Abstract
The article examines the Universal Dependencies (UD) annotation scheme. The UD project is an international initiative to produce treebanks of the world’s languages, whereby the treebanks have been annotated in a cross-linguistically consistent manner. A central aspect of the UD annotation scheme is its analysis of function words. The scheme advocates subordinating function words to content words. This article discusses linguistic and practical motivations behind the UD decision to subordinate function words to content words. It demonstrates that UD choices in this area are not supported linguistically. At the same time, the near convertibility of the UD treebanks to a more linguistically motivated annotation format means that the UD initiative remains of great value to linguistics in general.
Highlights
The Universal Dependencies (UD) project is a large-scale effort involving many dozens of researchers internationally to produce consistently annotated treebanks of the world’s languages (UD webpage: http://universaldependencies.org/).1 The consistency of annotation occurs in the form of adherence to the same one annotation scheme
It emphasizes that the potential for automated conversion of the UD corpora to an annotation format that is linguistically well-motivated means that the UD project is of great value to linguistics in general
In the crosslinguistic big picture, an annotation scheme that results in lower mean dependency distance (MDD) numbers is linguistically more plausible – other things being equal – since it is more consistent with the human tendency to reduce linguistic complexity in the interest of easing the burden on working memory
Summary
The Universal Dependencies (UD) project is a large-scale effort involving many dozens of researchers internationally to produce consistently annotated treebanks of the world’s languages (UD webpage: http://universaldependencies.org/). The consistency of annotation occurs in the form of adherence to the same one annotation scheme. The purely syntactic analysis, in contrast, is more plausible because it has subcategorization pointing down the hierarchy, from the head function verb has to the dependent content verb eaten. The result of these choices is a situation in which the content verb in the subordinate clause becomes the root of the entire sentence, as shown with tried in (9a) This result is quite implausible as there are two competing subjects, problem and this, a fact that the authors of UD have realized, since they reject the analysis in (9a) and adopt the one in (9b).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have