Measures of Syntactic Complexity and their Change over Time (the Case of Russian)

Aleksey Melnik,Tatiana Sherstinova,Evgenia Ushakova

doi:10.23919/fruct49677.2020.9211027

Abstract

Syntactic complexity is an important feature of any text, both written and oral. The information about syntactic complexity is crucial for successful solution of many practical NLP tasks starting from intellectual understanding of texts and ending with automatic machine translation. Because of this, syntactic complexity and its measures are in the center of attention of NLP developers. Thus far, quite a series of different measures of syntactic complexity have been developed; in this paper, it is proposed to consider 10 syntactic measures that have been proposed for syntactic stylometric analysis. The pilot experiment described in this paper was made on automatic syntactic text annotation made by UDPipe syntactic parser, which was manually corrected. In our approach, particular attention is paid to the analysis of stability of certain measures of syntactic complexity and the analysis of their variation. Thus, we try to evaluate, which syntactic properties of Russian texts may be considered as inherent for the language as a whole, and which of them undergo some changes. To achieve this task, we analyze the corpus of Russian literary texts for three decades. Due to their high stylistic variability, texts of fiction may be considered as excellent data for assessing different levels of complexity. The obtained results show the effectiveness of different measures for estimating text syntactic complexity and revealing their correlation.

Full Text