Abstract

In this work, we use machine learning techniques to address a research question regarding the authorship of two famous essays in the nineteenth century. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">On Liberty</i> (1859) and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">The Subjection of Women</i> (1869) were published under John Stuart Mill’s name, a widely studied nineteenth-century British philosopher. Mill himself attributed them to collaboration with his wife and partner, Harriet Taylor Mill. More than 150 years later, the question remains whether the author of these two canonical texts in the history of political thought was solely John Stuart Mill. Experts are divided on taking John Stuart Mill’s attribution at face value, since Harriet Taylor Mill had died in 1858. Addressing this question, we use a dataset consisted in essays of both authors, to train three state-of-the-art classifiers that are able to learn and distinguish the writing style of each author. Then, we use the models built to attribute the two famous essays of disputed authorship to one of the two. From the results, we conclude that the classifiers are able to learn the two classes very well, and they return high accuracies on the validation set. Regarding the test set, most of the models attribute the two essays to John Stuart Mill, however, the contribution of Harriet Taylor Mill is shown for some chunks of text of both essays. These results, we conclude, explain why experts are divided on this particular research question.

Highlights

  • Computer-based or computer-assisted authorship identification tries to answer an old question with new means: who is the real author of a piece of writing

  • These methods go beyond statistics, while making use of machine learning and artificial intelligence methods including deep learning (DL)

  • Texts by Helen Taylor are not included, to simplify the models and given that we focused on the collaboration between John Stuart Mill and Harriet Taylor Mill

Read more

Summary

Introduction

Computer-based or computer-assisted authorship identification tries to answer an old question with new means: who is the real author of a piece of writing. In recent years, adapting to specific research questions, scholars have proposed a number of advanced and automated authorship identification methods, that lie within the area of text mining. These methods go beyond statistics, while making use of machine learning and artificial intelligence methods including deep learning (DL). It is well known that DL methods work better when trained with many examples and for the task at hand we think that it is not the proper method to use, because our data are not enough in volume or size. In this work a dataset of 50M tweets is used for training which makes the decision of using DL reasonable

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call