Authorship Identification of Electronic Texts

Mahmoud Khonji,Loubna Mekouar,Youssef Iraqi

doi:10.1109/access.2021.3098192

Mahmoud Khonji, Loubna Mekouar + Show 1 more

Open Access

https://doi.org/10.1109/access.2021.3098192

Copy DOI

Abstract

Electronic text stylometry is concerned with analyzing the writing styles of input electronic texts to extract information about their authors. For example, such extracted data could be the authors’ identity or other aspects, such as their gender and age group. This survey paper presents the following contributions: 1) A description of all stylometry problems in probability terms, under a unified notation. 2) A survey of data representation (or feature extraction) methods. 3) A comprehensive evaluation of 23, 760 feature extraction methods followed by a thorough discussion of the results. This extensive evaluation is critical since the known data representation methods are often not evaluated under the same unified testbed.

Highlights

Improving solvers of stylometry problems is essential for enhancing various application domains, such as forensics, privacy, active-authentication [1]–[3], the detection of compromised accounts [4], recommender systems [5], deception detection, market analysis, and medical diagnosis [6], [7]
FEATURES EVALUATION RESULTS This evaluation aims to identify properties of the feature extraction functions that correspond to the increase in classification accuracy. Since this evaluation tests many feature extraction functions that are special cases of the at least l-frequent dir-directed k-skipped n-grams, the properties that we evaluate their effects on the classification accuracy are l, dir, k, n, and grams
This paper introduced electronic text stylometry problems under a unified notation in probability terms, their importance in enhancing various upper-layer applications, the key challenges currently faced in this field, the critical limitations of stylometry problem solvers, and suggestions for future directions to solve them

Summary

Introduction

Improving solvers of stylometry problems is essential for enhancing various application domains, such as forensics, privacy (or anti-forensics), active-authentication [1]–[3], the detection of compromised accounts [4], recommender systems [5], deception detection, market analysis, and medical diagnosis [6], [7]. Author identification can be accurately performed on program source codes [8], [9] as well as compiled binaries [10] Enhancing such application domains is growing increasingly more interesting thanks to the availability of large amounts of textual data via the Internet. Electronic text stylometry problems aim at inferring information about authors of input electronic texts. Such inferred information could be the identity of the authors, their genders, age groups, personality types, or even the diagnosis of specific illnesses [6], [7], [11]–[15]. A common taxonomy of electronic text stylometry problem solvers that is often followed by the literature is as follows:

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Authorship Identification of Electronic Texts

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

The Ideal Data Representation for Feature Extraction of Traditional Malay Musical Instrument Sounds Classification
Norhalina Senan ... Musa Mohd Mokji
-
Norhalina Senan, et. al.Norhalina Senan ... Musa Mohd Mokji
01 Jan 2009
01 Jan 2009

Sign Language Recognition Using Motion History Volume and Hybrid Neural Networks
Ho-Joon Kim ...
International Journal of Machine Learning and Computing | VOL. -
Ho-Joon Kim, et. al.Ho-Joon Kim ...
01 Jan 2012
International Journal of Machine Learning and Computing | VOL. -

Representing molecular and materials data for unsupervised machine learning
E Swann ... A S Barnard
Molecular Simulation | VOL. 44
E Swann, et. al.E Swann ... A S Barnard
02 Apr 2018
Molecular Simulation | VOL. 44

Optimized data representation and understanding method for the intelligent design of shear wall structures
Jin Han ... Hongjing Xue
Engineering Structures | VOL. 315
Jin Han, et. al.Jin Han ... Hongjing Xue
01 Jul 2024
Engineering Structures | VOL. 315

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Authorship Identification of Electronic Texts

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access