Benchmarking Author Recognition Systems for Forensic Application

Hans Van Halteren

doi:10.5195/lesli.2019.20

Abstract

This paper demonstrates how an author recognition system could be benchmarked, as a prerequisite for admission in court. The system used in the demonstration is the FEDERALES system, and the experimental data used were taken from the British National Corpus. The system was given several tasks, namely attributing a text sample to a specific text, verifying that a text sample was taken from a specific text, and verifying that a text sample was produced by a specific author. For the former two tasks, 1,099 texts with at least 10,000 words were used; for the latter 1,366 texts with known authors, which were verified against models for the 28 known authors for whom there were three or more texts. The experimental tasks were performed with different sampling methods (sequential samples or samples of concatenated random sentences), different sample sizes (1,000, 500, 250 or 125 words), varying amounts of training material (between 2 and 20 samples) and varying amounts of test material (1 or 3 samples). Under the best conditions, the system performed very well: with 7 training and 3 test samples of 1,000 words of randomly selected sentences, text attribution had an equal error rate of 0.06% and text verification an equal error rate of 1.3%; with 20 training and 3 test samples of 1,000 words of randomly selected sentences, author verification had an equal error rate of 7.5%. Under the worst conditions, with 2 training and 1 test sample of 125 words of sequential text, equal error rates for text attribution and text verification were 26.6% and 42.2%, and author verification did not perform better than chance. Furthermore, the quality degradation curves with slowly worsening conditions were not smooth, but contained steep drops. All in all, the results show the importance of having a benchmark which is as similar as possible to the actual court material for which the system is to be used, since the measured system quality differed greatly between evaluation scenarios and system degradation could not be predicted easily on the basis of the chosen scenario parameters.

Highlights

Author recognition and author profiling, i.e., attempts to deduce the identity or characteristics of the author of a text on the basis of observable properties of that text, have a venerable tradition
I present three tasks: a) recognizing from which text samples were taken, choosing between two texts; b) verifying whether samples were taken from a specific text; and c) verifying whether samples were written by a specific author
In several experiments I benchmarked an author recognition system, varying several aspects in order to examine the influence of changing circumstances on system quality

Summary

Introduction

Author recognition and author profiling, i.e., attempts to deduce the identity or characteristics of the author of a text on the basis of observable properties of that text, have a venerable tradition. Investigations were done by hand, as far back as the 15th century (Valla 1439/1440). They were certainly not always ad hoc, as can be seen from the work of Wincenty Lutosławski (1890). Manual investigation is applied even today, there appears. This journal is published by the University Library System, University of Pittsburgh as part of its D-Scribe Digital Publishing Program and is cosponsored by the University of Pittsburgh Press.

Methods

Results

Conclusion