Abstract

This paper presents an approach to solve authorship verification, a forensic text problem which consists in determining whether or not an unknown document was written by a particular author, from some samples of the author’s writing style. The core of the approach is the use of a graph representati on to extract relevant linguistic features based on network analysis techniques. The use of graphs provides rich data structures for representing lexical and syntactic aspects of texts, allowing the reinterpretation of centrality measures to extract linguistic features that do not depend entirely of stylistic elements of text documents. The proposed method is applied on the English language partitions of the clef PAN 2014 and 2015 author verification datasets, producing competitive results that outperform the state of the art baselines and are near (or surpass in one of the cases) to the best results reported so far, given the same training and test corpora. These experimental results showed that our interpretation of the four centrality measures: closeness, betweenness, degree and eigenvector allow to detect relevant patterns of an author’s writing style. In particular, words with high closeness which are part of some chunk phrases and words with high betweenness that are included in bigrams and trigrams, contribute in a more effective way to verify document authorship.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call