Abstract
Feature extraction is a common problem in statistical pattern recognition. It refers to a process whereby a data space is transformed into a feature space that, in theory, has exactly the same dimension as the original data space. However, the transformation is designed in such a way that the data set may be represented by a reduced number of "effective" features and yet retain most of the intrinsic information content of the data; in other words, the data set undergoes a dimensionality reduction. Principal component analysis is one of these processes. In this paper the data collected by counting selected syntactic characteristics in around a thousand paragraphs of each of the sample books underwent a principal component analysis. Authors of texts identified by the competitive neural networks, which use these effective features.
Highlights
Problems of authorship have always been attacked with traditional research methods: unearthing and dating original manuscripts, for instance
In this paper the data collected by counting selected syntactic characteristics in around a thousand paragraphs of each of the sample books underwent a principal component analysis
In this paper instead of cluster analysis of the two dimensional plots, the author attribution will be found by the use of artificial neural networks with output neurons competing on the data of first principal components
Summary
Problems of authorship have always been attacked with traditional research methods: unearthing and dating original manuscripts, for instance. Despite the fact that Hamilton and Madison have otherwise very similar styles, nearly identical sentence length distributions, as noted by Juola [5], Mosteller and Wallace found sharp differences in their preference for different function words: for instance, the word “upon” appears 3.24 times per 1000 words in Hamilton, and just 0.23 times in Madison [1]. Adjusting these frequencies with a Bayesian model, they showed that Madison had most likely written all 12 disputed papers. This technique is going to be elaborated
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.