Authorship Attribution Using Principal Component Analysis and Competitive Neural Networks

Mehmet Can

doi:10.3390/mca19010021

Abstract

Feature extraction is a common problem in statistical pattern recognition. It refers to a process whereby a data space is transformed into a feature space that, in theory, has exactly the same dimension as the original data space. However, the transformation is designed in such a way that the data set may be represented by a reduced number of "effective" features and yet retain most of the intrinsic information content of the data; in other words, the data set undergoes a dimensionality reduction. Principal component analysis is one of these processes. In this paper the data collected by counting selected syntactic characteristics in around a thousand paragraphs of each of the sample books underwent a principal component analysis. Authors of texts identified by the competitive neural networks, which use these effective features.

Highlights

Problems of authorship have always been attacked with traditional research methods: unearthing and dating original manuscripts, for instance
In this paper the data collected by counting selected syntactic characteristics in around a thousand paragraphs of each of the sample books underwent a principal component analysis
In this paper instead of cluster analysis of the two dimensional plots, the author attribution will be found by the use of artificial neural networks with output neurons competing on the data of first principal components

Summary

INTRODUCTION

Problems of authorship have always been attacked with traditional research methods: unearthing and dating original manuscripts, for instance. Despite the fact that Hamilton and Madison have otherwise very similar styles, nearly identical sentence length distributions, as noted by Juola [5], Mosteller and Wallace found sharp differences in their preference for different function words: for instance, the word “upon” appears 3.24 times per 1000 words in Hamilton, and just 0.23 times in Madison [1]. Adjusting these frequencies with a Bayesian model, they showed that Madison had most likely written all 12 disputed papers. This technique is going to be elaborated

PRINCIPAL COMPONENT ANALYSIS

Theory of Principal component Analysis

ARTIFICIAL NEURAL NETWORKS

Multilayer Perceptrons

PROBLEM DEFINITION

PRINCIPAL COMPONENTS OF SAMPLE TEXTS

APPLICATION TO AUTHOR ATTRIBUTION

Findings

CONCLUSIONS

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mathematical and Computational Applications	Publication Date: Apr 1, 2014
Citations: 13	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Authorship Attribution Using Principal Component Analysis and Competitive Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematical and Computational Applications

Lead the way for us

Similar Papers

Authorship Attribution Using Principal Component Analysis and Nearest Neighbor Rule for Neural Networks
Mehmet Can
Southeast Europe Journal of Soft Computing | VOL. 1
Mehmet CanMehmet Can
24 Oct 2012
Southeast Europe Journal of Soft Computing | VOL. 1

Authorship Attribution Using Principal Component Analysis and Nearest Neighbor Rule for Neural Networks

-

24 Oct 2012
24 Oct 2012

Principal Component Analysis and Neural Networks for Authorship Attribution
Mehmet Can
Southeast Europe Journal of Soft Computing | VOL. 1
Mehmet CanMehmet Can
29 Feb 2012
Southeast Europe Journal of Soft Computing | VOL. 1

Principal Component Analysis for Authorship Attribution
Amir Jamak
Southeast Europe Journal of Soft Computing | VOL. 1
Amir JamakAmir Jamak
29 Feb 2012
Southeast Europe Journal of Soft Computing | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Authorship Attribution Using Principal Component Analysis and Competitive Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematical and Computational Applications