Graph-Based Siamese Network for Authorship Verification

Daniel Embarcadero-Ruiz,Alberto Embarcadero-Ruiz,Helena Gómez-Adorno,Gerardo Sierra

doi:10.3390/math10020277

Daniel Embarcadero-Ruiz, Alberto Embarcadero-Ruiz + Show 2 more

Open Access

https://doi.org/10.3390/math10020277

Copy DOI

Journal: Mathematics	Publication Date: Jan 17, 2022
Citations: 6	License type: CC BY 4.0

Affiliation: Universidad Nacional Autónoma de México

Abstract

In this work, we propose a novel approach to solve the authorship identification task on a cross-topic and open-set scenario. Authorship verification is the task of determining whether or not two texts were written by the same author. We model the documents in a graph representation and then a graph neural network extracts relevant features from these graph representations. We present three strategies to represent the texts as graphs based on the co-occurrence of the POS labels of words. We propose a Siamese Network architecture composed of graph convolutional networks along with pooling and classification layers. We present different variants of the architecture and discuss the performance of each one. To evaluate our approach we used a collection of fanfiction texts provided by the PAN@CLEF 2021 shared task in two settings: a “small” corpus and a “large” corpus. Our graph-based approach achieved average scores (AUC ROC, F1, Brier score, F0.5u, and C@1) between 90% and 92.83% when training on the “small” and “large” corpus, respectively. Our model obtain results comparable to those of the state of the art in this task and greater than traditional baselines.

Highlights

Authorship analysis aims to identify characteristics of an author’s writing style given a text sample, and to identify the author himself
The main contribution of this work is a novel Siamese network architecture composed of two graph convolutional neural networks, pooling, and classification layers to approach the authorship verification task
We found two relevant approaches used for the Authorship Verification task, but both approaches model the texts in a sequential manner

Summary

Introduction

Authorship analysis aims to identify characteristics of an author’s writing style given a text sample, and to identify the author himself. The idea behind this research area is that some features of the documents allow distinguishing texts written by different authors [1]. The authorship verification task aims to determine if two given texts were written by the same author. The main contribution of this work is a novel Siamese network architecture composed of two graph convolutional neural networks, pooling, and classification layers to approach the authorship verification task. We present three strategies (short, med, and full) for representing texts as graphs based on the relation of the POS labels and co-occurrence of the words. Our motivation is that graph representation provides structural information that is not available when texts are processed in the traditional sequential manner

Objectives

Methods

Results

Conclusion