Abstract

In this work, we propose a novel approach to solve the authorship identification task on a cross-topic and open-set scenario. Authorship verification is the task of determining whether or not two texts were written by the same author. We model the documents in a graph representation and then a graph neural network extracts relevant features from these graph representations. We present three strategies to represent the texts as graphs based on the co-occurrence of the POS labels of words. We propose a Siamese Network architecture composed of graph convolutional networks along with pooling and classification layers. We present different variants of the architecture and discuss the performance of each one. To evaluate our approach we used a collection of fanfiction texts provided by the PAN@CLEF 2021 shared task in two settings: a “small” corpus and a “large” corpus. Our graph-based approach achieved average scores (AUC ROC, F1, Brier score, F0.5u, and C@1) between 90% and 92.83% when training on the “small” and “large” corpus, respectively. Our model obtain results comparable to those of the state of the art in this task and greater than traditional baselines.

Highlights

  • Authorship analysis aims to identify characteristics of an author’s writing style given a text sample, and to identify the author himself

  • The main contribution of this work is a novel Siamese network architecture composed of two graph convolutional neural networks, pooling, and classification layers to approach the authorship verification task

  • We found two relevant approaches used for the Authorship Verification task, but both approaches model the texts in a sequential manner

Read more

Summary

Introduction

Authorship analysis aims to identify characteristics of an author’s writing style given a text sample, and to identify the author himself. The idea behind this research area is that some features of the documents allow distinguishing texts written by different authors [1]. The authorship verification task aims to determine if two given texts were written by the same author. The main contribution of this work is a novel Siamese network architecture composed of two graph convolutional neural networks, pooling, and classification layers to approach the authorship verification task. We present three strategies (short, med, and full) for representing texts as graphs based on the relation of the POS labels and co-occurrence of the words. Our motivation is that graph representation provides structural information that is not available when texts are processed in the traditional sequential manner

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call