A New Stylometry Method Basing on the Numerals Statistic

Andrei Viacheslavovich Zenkov

doi:10.11648/j.ijdst.20170302.11

Abstract

A new method of statistical analysis of texts is suggested. The frequency distribution of the first significant digits in numerals of connected authorial English-language texts is considered. Benford's law is found to hold approximately for these frequencies with a marked predominance of the digit 1. Deviations from Benford's law are statistically significant author peculiarities that allow, under certain conditions, to consider the problem of authorship and distinguish between texts by different authors. At the end of {1, 2,…, 8, 9} row, the digits distribution is subject to strong fluctuations and thus unrepresentative for our purpose. The approach suggested and the conclusions are backed by the examples of the computer analysis of works by W. M. Thackeray, M. Twain, R. L. Stevenson et al. The results are confirmed on the basis of non-parametric range Mann-Whitney and Kruskal-Wallis tests as well as the parametric Pearson's chi-squared test.

Highlights

The scope of the practical use of Benford’s law [1] has significantly expanded
Known for over a hundred years, Benford's law refers to the probability of occurrence of a certain first significant digit in the distribution of various real life data
We present here new research results concerning the distribution of the first significant digits of numerals contained in coherent English-language texts

Summary

Introduction

The scope of the practical use of Benford’s law [1] has significantly expanded. In contrast to the traditional methodology of application of Benford's law, which treats deviations from the law as an indication of the possible existence of "falsification" (broadly defined), he placed emphasis on the comparison of these deviations for texts by different authors, showing that these deviations are statistically robust author features that allow to distinguish between texts by different authors (under certain conditions, the most important of which is a sufficiently large text) Basing on these ideas, we present here new research results concerning the distribution of the first significant digits of numerals contained in coherent English-language texts. For all (English-language fiction) texts subjected to computer-aided statistical analysis, we have studied the frequency of occurrence of various first significant digits of numerals, taking into account cardinal as well as ordinal numerals expressed both in figures, and (considerably more often) verbally. Texts analyzed are mainly taken from the Project Gutenberg website http://www.gutenberg.org

Distribution of First Significant Digits of Numerals in Compound Texts

Distribution of First Significant Digits of Numerals in Coherent Texts

Jane Austen and Her Imitators

Authorship of the 15th Book of Oz

Testing of Methodology

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A New Stylometry Method Basing on the Numerals Statistic

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal on Data Science and Technology

Lead the way for us

Journal: International Journal on Data Science and Technology	Publication Date: Jan 1, 2017
License type: cc-by

Similar Papers

A Method of Text Attribution Based on the Statistics of Numerals
Andrei V Zenkov
Journal of Quantitative Linguistics | VOL. 25
Andrei V ZenkovAndrei V Zenkov
20 Sep 2017
Journal of Quantitative Linguistics | VOL. 25

PRIMJENA BENFORDOVA ZAKONA PRILIKOM OTKRIVANJA PSIHOLOŠKO ODREĐENIH CIJENA
...
Zbornik radova - Journal of Economy and Business | VOL. -
, et. al. ...
25 Dec 2018
Zbornik radova - Journal of Economy and Business | VOL. -

Natural taxonomic categories of angiosperms obey Benford's law, but artificial ones do not
Lucía Campos ... Antonio Flores-Moya
Systematics and Biodiversity | VOL. 14
Lucía Campos, et. al.Lucía Campos ... Antonio Flores-Moya
27 May 2016
Systematics and Biodiversity | VOL. 14

Book Review
Robert Pinsker
Journal of Information Systems | VOL. 26
Robert PinskerRobert Pinsker
01 Mar 2012
Journal of Information Systems | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A New Stylometry Method Basing on the Numerals Statistic

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal on Data Science and Technology