DETERMINING THE MORPHOLOGICAL CLASS OF A WORD DURING THE AUTOMATIC NATURAL LANGUAGE PROCESSING

O Hyryn

doi:10.35433/philology.1(99).2023.75-82

Abstract

The article considers a mandatory component of the linguistic provision of any system of automatic natural language processing, i.e. automatic morphological analysis, the tasks of which include: determining for each text unit its place in the morphological system of the corresponding language; identification of word forms of the lexeme. As a result of automatic morphological analysis, each word form of the text is assigned a tag for the part of speech and the meaning of the grammatical categories (gender, number, case, aspect, tense, person, etc.). The nature of this information, its volume, and the methods used to establish morphological information depend on the purpose of the research, within which automatic analysis is carried out with the focus on the nature of the analyzed texts. Morphological analysis is present at all stages of text analysis, because neither morphemic, nor syntactic, nor semantic analysis can be performed without parts-of-speech tagging. With automatic syntactic analysis, only if lexical-grammatical and grammatical information is available for each word form, it is possible to syntactically bind word forms in a sentence. Morphological features of text units further become a tool for researching the relationship between vocabulary and grammar and the use in speech; between paradigmatics (in the aspect of consideration of case forms of declinable words) and syntagmatics (in the aspect of linear relationships of words, text coherence). The article examines the difficulties that prevent unifying the process of tagging text units, namely lexical-grammatical homonymy, ambiguity of grammatical forms, polysemy. The study considers approaches to resolving morphological ambiguity based on the context analysis of the ambiguous word, which can be divided into statistical and rule-based. Rules can be compiled manually or derived from marked-up corpora. Statistical methods are based on quantitative indicators in large labelled corpora. Morphological ambiguity resolution methods are usually applied after primary tagging, which is usually done using dictionaries. The article also provides a morphemic analysis algorithm for automatic morphological analysis.

Full Text