Abstract

Electronic text Author Attribution (AA) is a well known stylometry problem that attempts to infer the identity of authors of disputed electronic texts by solely analyzing the texts. This is important for various applications such as forensics and market analysis. However, currently the state of the art in author identification has never been evaluated against Emirati social media electronic texts. This is partly due to the fact that no evaluation dataset exists that is suitable for evaluating author identification methods in the domain of Emirati social media electronic texts. This paper presents the first of such evaluations, along with the release of the Khonji-Iraqi Emirati Tweets author identification evaluation dataset with 30 authors (KIT30). Additionally, novel definitions of grams are introduced, namely compound grams, which demonstrate that decision models that make use of them can achieve higher classification accuracies than the alternative case when classical definitions of grams are followed. The findings also indicate that, when suitable data representation is used, the degradation in the classification accuracy, as the space of suspect authors increases, is not necessarily as sharp as previously reported in the literature. This suggests that AA problem solvers can be significantly more scalable as previously evaluated.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call