XML-based Analysis of Early 20th-Century Russian State Duma Verbatims

Nadezhda Povroznik

doi:10.4000/ilcea.9338

Abstract

The study is devoted to the analysis of the documentation of the State Duma of the Russian Empire (1906–1917), which had legitimately limited the power of the monarch. The methodological basis of the research is the creation of a corpus of texts based on personal alphabetic indexes to the verbatim reports of the I-IV State Duma, the marking up of texts based on the developed XML markup scheme and analysis of changes in the structure of sources based on the analysis of tags distribution. The total volume of the created corpus of texts is 749,793 words. The markup scheme reflects the structure of indexes and includes metadata of the source, personal characteristics of the deputies and tags related to the parliamentary activities of the deputies. This article examines the structure of indexes and its sessional difference based on the matrix representation of data. Analysis of the markup data shows that the structure of personal indexes to the verbatim reports has undergone significant changes during their publication between 1906-1917. Differences in the structure of indexes exist between Dumas and in the structure of documents of a single volume among sessions. Initially, the indexes contained additional information about the election processes, which were neglected in the upcoming documents. The social characteristics of the deputies were also not published in the same form and represented different data sets depending on the Duma, and also differed in completeness and information. The research methodology has shown its effectiveness in studying the dynamics of the source structure, and data extraction for the subsequent studying of activity of deputies on the basis of mathematical methods

Full Text