Abstract

The paper concerns implementing maximum entropy tagging model and neural net dependency parser model for Russian language in Stanford CoreNLP toolkit, an extensible pipeline that provides core natural language analysis. Russian belongs to morphologically rich languages and demands full morphological analysis including annotating input texts with POS tags, features and lemmas (unlike the case of case-, person-, etc. insensitive languages when stemming and POS-tagging give enough information about grammatical behavior of a word form). Rich morphology is accompanied by free word order in Russian which adds indeterminacy to head finding rules in parsing procedures. In the paper we describe training data, linguistic features used to learn the classifiers, training and evaluation of tagging and parsing models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call