Abstract

In 1985, Zwicky argued that ‘particle’ is a pretheoretical notion that should be eliminated from linguistic analysis. We propose a reclassification of Russian particles that implements Zwicky’s directive. Russian particles lack a coherent conceptual basis as a category and many are ambiguous with respect to part of speech. Our corpus analysis of Russian particles addresses theoretical questions about the cognitive status of parts of speech and practical concerns about how particles should be represented in computational models. We focus on nine high-frequency words commonly classed as particles: esce, tak, ved’, slovno, daže, že, li, da, net. We show that the current tagging of particles in the manually disambiguated Morphological Standard of the Russian National Corpus is not entirely consistent, and that this can create challenges for training a part-of-speech tagger. We offer an alternative tagging scheme that eliminates the category of ‘particle’ altogether. We show that our enriched scheme makes it possible for a part-of-speech tagger to achieve more useful results. Our analysis of particles provides a detailed account of various sub-uses that correspond to different parts of speech, their relationships, and relative distribution. In this sense, our study also contributes to the study of words that exhibit part-of-speech ambiguities.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call