Abstract

In this paper, an investigation was done to identify writing style features that can be used for cross-topic and cross-genre documents in the Authorship Identification task from 2003 to 2015. Different writing style features were empirically evaluated that were previously used in single topic and single genre documents for Authorship Identification to determine whether they can be used effectively for cross-topic and crossgenre Authorship Identification using an ablation process. The dataset used was taken from the 2015 PAN CLEF Forum English collection consisting of 100 sets. Furthermore, it was investigated whether combining some of these feature sets can help improve the authorship identification task. Three different classifiers were used: Naïve Bayes, Support Vector Machine, and Random Forest. The results suggest that a combination of a lexical, syntactical, structural, and content feature set can be used effectively for cross topic and cross genre authorship identification, as it achieved an AUC result of 0.837.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.