Abstract
Nowadays, forensic authorship authentication plays a vital role in identifying the number of unknown authors as a result of the world’s rapidly rising internet use. This paper presents two-level learning techniques for authorship authentication. The learning technique is supplied with linguistic knowledge, statistical features, and vocabulary features to enhance its efficiency instead of learning only. The linguistic knowledge is represented through lexical analysis features such as part of speech. In this study, a two-level classifier has been presented to capture the best predictive performance for identifying authorship. The first classifier is based on vocabulary features that detect the frequency with which each author uses certain words. This classifier’s results are fed to the second one which is based on a learning technique. It depends on lexical, statistical and linguistic features. All of the three sets of features describe the author’s writing styles in numerical forms. Through this work, many new features are proposed for identifying the author’s writing style. Although, the proposed new methodology is tested for Arabic writings, it is general and can be applied to any language. According to the used machine learning models, the experiment carried out shows that the trained two-level classifier achieves an accuracy ranging from 94% to 96.16%.
Highlights
Forensic authorship authentication means to detect the principal author of an unknown article [1]
The main idea is that each author has a writing style that is different from one to another [2]
Throughout this section, we present a review of the approaches for Arabic authorship authentication including machine learning-based authorship authentication and various types of stylometric features
Summary
Forensic authorship authentication means to detect the principal author of an unknown article [1]. The main idea is that each author has a writing style that is different from one to another [2] This is because some authors’ uncontrollable behaviours and writing styles have shown to be successful over time. The instance-based approach subsequently extracts the features of writing style from each article for each author. It allows catching any variation in the style of writing. Profile-based methods extract writing features by concatenating all the articles belonging to a specific author in a large file. This method helps to identify the most uncontrolled behaviors and characterized features of the author’s writing style. A mixture of both directions is proposed to improve the authentication process’s performance
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.