Abstract

This article presents a comparative study of four morphological parsers of Russian – mystem, pymorphy2, TreeTagger, and FreeLing – involving the two main tasks of morphological analysis: lemmatization and POS tagging. The experiments were conducted on three currently available Russian corpora which have qualitative morphological labeling – Russian National Corpus, OpenCorpora, and RU-EVAL (a small corpus created in 2010 to evaluate parsers). As evaluation measures, the authors use accuracy for lemmatization and F1-measure for POS tagging. The authors give error analysis, identify the most difficult parts of speech for the parsers, and analyze the work of parsers on dictionary words and predicted words.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call