Abstract

A novel method of statistical analysis of texts is suggested. The frequency distribution of the first significant digits in numerals of connected authorial English-language texts is considered. Benford’s law is found to hold approximately for these frequencies with a marked predominance of the digit 1. Differences between the Benford-like distributions for the texts by different authors are statistically significant author peculiarities that allow, under certain conditions, to consider the problem of authorship. The actual frequency of occurrence usually is higher than the probability according to Benford’s law for the first significant digits 1, 2, and sometimes 3; for greater digits, the situation is reversed, and the digits distributions are characterized by strong fluctuations thus making these distributions unrepresentative for our purpose. The approach suggested and the conclusions are backed by the examples of the computer analysis of works by W. M. Thackeray, M. Twain, R. L. Stevenson, et al. The results are confirmed on the basis of the parametric Pearson chi-squared test as well as the non-parametric Mann–Whitney U test and Kruskal–Wallis test.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.