Abstract

The statistical properties of the lexis of texts in natural languages have attracted the close attention of many mathematicians and linguists over the past hundred years. The basic law of the dependence of the word frequency on its rank, which is known as Zipf’s law, is that the product of the frequency of a word and its rank is approximately constant and is a marker of the text language. It should be noted that numerous analogs of Zipf’s law in other subject areas have great practical significance for describing the functioning of various sociotechnical systems: Auerbach’s law of the distribution of cities by population, Pareto’s law of the distribution of material goods in society, Bradford’s law of the distribution of scientists by productivity, Lotka law of the distribution of publications in bibliographic sources. The article considers the latent frequency characteristics of the lexis of scientific texts on various topics by the array of articles in the multidisciplinary open access journal “Young Scientist”. The paper formulates the concept of a higher-order frequency table and empirically investigates the dependence of frequencies in such tables on their ranks. Power, hyperbolic, and other twoparameter models of the dependence of frequencies on ranks in tables of frequencies of higher orders were constructed for all articles from the corpus of texts under consideration. The constructed models are generalizations of the well-known Zipf models and have high quality indicators. In the paper, new predictors are obtained, which can be useful for solving problems of classifying scientific texts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call