This study using the corpus of newspaper articles from The Guardian, The Washington Post, and The New York Times attempts to explore contemporary methods of corpus analysis. It emphasizes that a corpus-driven approach is relevant for improving the technology of balancing text corpora. This approach is characterized by ways of representing corpus data and unifying decoded information based on the corpus. To this end, urgent news and international reports covering the political strategies of various countries in the context of the Russia-Ukraine war were analyzed. It is noted that understanding the essence of the defined issue required outlining the specifics of studying military-political language patterns within the corpus-driven approach. The research found that a variety of data types can influence the formation of the linguistic environment, which explains variations of language system models in textual dimensions. The results also showed random concordance lines, the semantic field of lexemes, and the most frequent units of the journalistic language system that reveal linguistic changes during the war period. Additionally, the study identifies final variables in statistical models, dominantsemantic positions of linguistic constructions, the valency of marked units, and their linguistic potential, status, and functions within the corpus. The importance of using Text Mining technology, the ANNOVA test, and Cortical.io software for data processing is also justified. It is concluded that improving corpus analysis methods depends on text attribution, hierarchical clustering, determining statistically significant differences between mean values of variables through analysis of variance. Moreover, the study enable the conclusion that the text processing mechanism plays an important role in the development of new paradigm of corpus analysis.
Read full abstract