Abstract

We describe an empirical method to explore and contrast the roles of default and principal part information in the differentiation of inflectional classes. We use an unsupervised machine learning method to classify Russian nouns into inflectional classes, first with full paradigm information, and then with particular types of information removed. When we remove default information, shared across classes, we expect there to be little effect on the classification. In contrast when we remove principal part information we expect there to be a more detrimental effect on classification performance. Our data set consists of paradigm listings of the 80 most frequent Russian nouns, generated from a formal theory which allows us to distinguish default and principal part information. Our results show that removal of forms classified as principal parts has a more detrimental effect on the classification than removal of default information. However, we also find that there are differences within the defaults and principal parts, and we suggest that these may in part be attributable to stress patterns.

Highlights

  • The particular challenge which languages with inflectional classes pose is that these classes create an additional layer of complexity which is more or less irrelevant from the perspective of syntax

  • They represent a particular kind of morphological complexity which it is important to distinguish from other phenomena which may be associated with these terms

  • In this paper we explore how well an unsupervised learning method classifies nouns into inflectional classes, and consider the degree to which these classes match with ones which have been identified for Russian

Read more

Summary

Introduction

The particular challenge which languages with inflectional classes pose is that these classes create an additional layer of complexity which is more or less irrelevant from the perspective of syntax. NOM SG ACC SG GEN SG DAT SG PREP SG INS SG NOM PL ACC PL GEN PL DAT PL PREP PL INS PL ‘deed’ Class IV del-o del-o del-a del-u del-e del-om del-a del-a del del-am del-ax del-ami ‘factory’ Class I zavod zavod zavod-a zavod-u zavod-e zavod-om zavod-i zavod-i zavod-ov zavod-am zavod-ax zavod-ami ‘country’ Class II stran-a stran-u stran-i stran-e stran-e stran-oj stran-i stran-i stran stran-am stran-ax stran-ami ‘bone’ Class III kostkostkost-i kost-i kost-i kost-ju kost-i kost-i kost-ej kost-am kost-ax kost-ami This complexity cannot be explained by the role of gender assignment. The distinction between class II and III for example has no ramifications in the rules of agreement This is pure morphological complexity whereby one and the same grammatical distinction can be expressed in a number of different ways. This is additional structure which is not relevant from the point of view syntax. It is complexity associated with autonomous morphology in the sense of Aronoff (1994)

Defaults and principal parts
Compression-based machine learning
Extracting classes from unordered trees
Evaluating inflectional class results
Summary of the experimental method
Data format
Data sets
Validating ‘right answer’ sets
Validating full paradigms
Removing defaults
Removing principal parts
Analysis
Conclusions
Future work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.