Abstract

Feature extraction is a common problem in statistical pattern recognition. It refers to a process whereby a data space is transformed into a feature space that, in theory, has exactly the same dimension as the original data space. However, the transformation is designed in such a way that the data set may be represented by a reduced number of "effective" features and yet retain most of the intrinsic information content of the data; in other words, the data set undergoes a dimensionality reduction. Principal component analysis is one of these processes. In this paper the data collected by counting selected syntactic characteristics in around a thousand paragraphs of each of the sample books underwent a principal component analysis. Authors of texts identified by the competitive neural networks, which use these effective features.

Highlights

  • Problems of authorship have always been attacked with traditional research methods: unearthing and dating original manuscripts, for instance

  • In this paper the data collected by counting selected syntactic characteristics in around a thousand paragraphs of each of the sample books underwent a principal component analysis

  • In this paper instead of cluster analysis of the two dimensional plots, the author attribution will be found by the use of artificial neural networks with output neurons competing on the data of first principal components

Read more

Summary

INTRODUCTION

Problems of authorship have always been attacked with traditional research methods: unearthing and dating original manuscripts, for instance. Despite the fact that Hamilton and Madison have otherwise very similar styles, nearly identical sentence length distributions, as noted by Juola [5], Mosteller and Wallace found sharp differences in their preference for different function words: for instance, the word “upon” appears 3.24 times per 1000 words in Hamilton, and just 0.23 times in Madison [1]. Adjusting these frequencies with a Bayesian model, they showed that Madison had most likely written all 12 disputed papers. This technique is going to be elaborated

PRINCIPAL COMPONENT ANALYSIS
Theory of Principal component Analysis
ARTIFICIAL NEURAL NETWORKS
Multilayer Perceptrons
PROBLEM DEFINITION
PRINCIPAL COMPONENTS OF SAMPLE TEXTS
APPLICATION TO AUTHOR ATTRIBUTION
Findings
CONCLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.