Abstract

Michael Oakes is a reader in Computational Linguistics at the University of Wolverhampton and an expert in statistical and corpus-based methods for language research. This book integrates, in textbook style, recent and old research about the application of quantitative methods to the automatic analysis of authorship and author profiles on the basis of properties of the text, a discipline often called computational stylometry. In addition, other language-related ‘detective work’, such as decipherment of old scripts and plagiarism detection are discussed as well. The currently dominant approach in computational stylometry is based on automatic text categorization, the field of computer science that also brought us spam filtering. As an example, for author identification, this approach would entail (1) defining a representation of text in terms of (mainly) linguistic properties, (2) training a model using statistical or machine learning methods on the basis of such representations of texts with known authorship, and (3) applying the learned model to unknown texts to decide authorship. This approach is based on the belief that when the linguistic properties used to represent a text are well chosen, the quantitative methods in (2) can learn how individual authors differ in style: author style would then be an idiosyncratic combination of preferences in the use of the linguistic properties, modeled in the text representations. Authorship methods based on some form of similarity or overlap between linguistic properties of different texts fit this scheme as well, as an instance of nearest-neighbor classification. Apart from this supervised learning, unsupervised techniques are used as well, clustering the texts after step (1) automatically into groups with similar stylistic properties.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.