Abstract

1 Authorship attribution and plagiarism analysisThe Internet has facilitated both the dissemination of anonymous texts as well aseasy ‘‘borrowing’’ of ideas and words of others. This has raised a number ofimportant questions regarding authorship. Can we identify the anonymous author ofa text by comparing the text with the writings of known authors? Can we determineif a text, or parts of it, has been plagiarized? Such questions are clearly of bothacademic and commercial importance.The task of determining or verifying the authorship of an anonymous text basedsolely on internal evidence is a very old one, dating back at least to the medievalscholastics, for whom the reliable attribution of a given text to a known ancientauthority was essential to determining the text’s veracity. More recently, the problemof authorship attribution has gained greater prominence due to new applications inforensic analysis, humanities scholarship, and electronic commerce, and thedevelopment of computational methods for addressing the problem.Over the last century and more, a great variety of methods have been applied toauthorship attribution problems of various sorts. One can roughly trace the evolutionof methods through three main stages. In the earliest stage researchers sought asingle numeric function of a text to discriminate between authors. In a later stage,statistical multivariate discriminant analysis was applied to word frequencies andrelated numerical features. Most recently, machine learning methods and high-dimensional textual features have been applied to sets of training documents to

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call