Automatic Authorship Attribution for Texts in Croatian Language Using Combinations of Features

Tomislav Reicher,Igor Belša,Artur Šilić,Ivan Krišto

doi:10.1007/978-3-642-15390-7_3

Abstract

In this work we investigate the use of various character, lexical, and syntactic level features and their combinations in automatic authorship attribution. Since the majority of text representation features are language specific, we examine their application on texts written in Croatian language. Our work differs from the similar work in at least three aspects. Firstly, we use slightly different set of features than previously proposed. Secondly, we use four different data sets and compare the same features across those data sets to draw stronger conclusions. The data sets that we use consist of articles, blogs, books, and forum posts written in Croatian language. Finally, we employ a classification method based on a strong classifier. We use the Support Vector Machines to learn classifiers which achieve excellent results for longer texts: 91% accuracy and F1 measure for blogs, 93% acc. and F1 for articles, and 99% acc. and F1 for books. Experiments conducted on forum posts show that more complex features need to be employed for shorter texts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic Authorship Attribution for Texts in Croatian Language Using Combinations of Features

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Knowledge-Based and Intelligent Information and Engineering Systems
Robert J Howlett ... Lakhmi C Jain
-
Robert J Howlett, et. al.Robert J Howlett ... Lakhmi C Jain
01 Jan 2009
01 Jan 2009

Leksičke značajke deseteračkih dvostiha u tiskanim zapisima Slavka Jankovića i rukopisnoj ostavštini Luke Lukića
Ana Tereza Želinski
-
Ana Tereza ŽelinskiAna Tereza Želinski
13 Jul 2021
13 Jul 2021

Extracting Compact Sets of Features for Question Classification in Cognitive Systems: A Comparative Study
Marco Pota ... Giuseppe De Pietro
-
Marco Pota, et. al.Marco Pota ... Giuseppe De Pietro
01 Nov 2015
01 Nov 2015

Outlier detection using flexible categorization and interrogative agendas
Marcel Boersma ... Nachoem Wijnberg
Decision Support Systems | VOL. 180
Marcel Boersma, et. al.Marcel Boersma ... Nachoem Wijnberg
19 Feb 2024
Decision Support Systems | VOL. 180

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Authorship Attribution for Texts in Croatian Language Using Combinations of Features

Abstract

Talk to us

Similar Papers