Investigating hybrids of evolutionary search and linear discriminant analysis for authorship attribution

Kareem Shaker,Richard Everson,David Corne

doi:10.1109/cec.2007.4424728

Abstract

Authorship attribution is the problem of determining who is (or was) the author of one or more texts, in cases where authorship is disputed. There are many well- known cases of disputed authorship; in this paper we consider the Federalist papers, and the 15th Book of Oz. We treat the problem as a supervised classification problem, and use evolutionary algorithms to search through subsets of function words, which in turn form the basis of predicting authorship via linear discriminant analysis. We compare two approaches (due to the size of the text corpora in dispute, extensive experimentation is difficult), both centred around the optimization of ROC curves. On both datasets, the hybrid EA approach was able to classify the disputed works with 100% accuracy, using small sets of function words comparable to or better than previous works on these cases.

Full Text