Abstract

A one-consonant group approach to the authorship attribution has been proposed. The approach is based on determining, by the chi-square test, the consonant group in which the difference between the texts by different authors is statistically significant. The developed model determines author-differentiating capability of each consonant group in a relation of the number of comparisons, in which the difference between the texts by two authors is statistically significant to the total number of comparisons. The determined general author-differentiating capability of the group of stop consonants, which is a statistical parameter of the authorial style, is the highest in the comparisons of texts from the publicist and belles-lettres styles. The one-consonant group approach simplifies the whole process of authorship attribution and ensures a higher level of automation. The conducted experiments on the Java programming language have proved that the chi-square test is a powerful nonparametric statistical test that can be used for author identification on the level of English consonants with a test validity of 95%.

Highlights

  • A language is not a strictly arranged system and has probabilistic and stochastic character

  • The experiments have been aimed at determining the consonant group in which the author can be identified

  • This is the group of stop coronal—have lower author-differentiating that the stop consonant comparison

Read more

Summary

Introduction

A language is not a strictly arranged system and has probabilistic and stochastic character. In this case, it is advisable to apply the statistical methods. The analysis of recent publications has shown that authorship attribution has been performed on all language levels: morphological, lexical, and syntactical [1,2]. The choice of a language level is of great importance for formalization as the level structure must be strict. With regard to the lexical level, its structure is not easy to formalize because of neologisms and foreign loans that constantly enlarge the vocabulary. Syntactical structures vary from simple to complicated. The latter are difficult to formalize [5,6,7].

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call