Abstract

In recent years, author identification has become an active research area, where the major differences are caused by paper or online medium, mode of entry and target audience. Much research has been devoted to analyzing writing styles in handwritten, word-processed and online social networks (OSN) texts. Word processing editors that typically include spell and grammar checkers may influence the writing style as it allows an individual to edit a piece of text to perfection. Thus, similarities may exist between OSN and word-processed texts. Moreover, none of the studies to date have made a detailed comparison of the writing styles across multidisciplinary factors. This paper attempts to close the gap between the writing styles in pre- and post-Internet periods as well as provide an in-depth comparison of the writing styles in OSN texts across three major factors: demographics, personality & behavior, and cybersecurity. The aim is to learn from past literature as we advance these techniques to OSN texts. Thus, in this paper, we also propose a novel machine learning prediction model based on tense morphology, to classify age and gender from English blogs, and the PAN 2013 dataset. This model achieves an accuracy of 94%-98% and 95%-97% for age and gender, respectively.

Highlights

  • Author identification has been an active research field for centuries [1]

  • Presents speech variation as a writing style in online social networks (OSN) text across demographics and cybersecurity

  • The results showed that the combination of stylometric features yielded the best accuracy in different statistical and machine learning approaches for authorship attribution, even for short messages such as chat posts

Read more

Summary

Introduction

Author identification has been an active research field for centuries [1]. The rationale behind this research is that some style characteristics can be extracted from a piece of writing, and those characteristics are unique to an individual. Author identification research looks at authorship attribution and authorship verification. Authorship attribution is defined as the task of seeking an exact author, while that of authorship verification is to determine whether the same author [2] wrote the documents. Users’ heuristic behavior will show in their online writings if they spend a great deal of time online. Features such as structural traits, language usage, content markers, and stylistic features can be used to form a feature collection for author identification. Regardless whether individuals write under a pseudonym or authornym or real name, The associate editor coordinating the review of this manuscript and approving it for publication was Shirui Pan

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.