Learning Stylometric Representations for Authorship Analysis.

Steven H H Ding,Farkhund Iqbal,William K Cheung,Benjamin C M Fung

doi:10.1109/tcyb.2017.2766189

Abstract

Authorship analysis (AA) is the study of unveiling the hidden properties of authors from textual data. It extracts an author's identity and sociolinguistic characteristics based on the reflected writing styles in the text. The process is essential for various areas, such as cybercrime investigation, psycholinguistics, political socialization, etc. However, most of the previous techniques critically depend on the manual feature engineering process. Consequently, the choice of feature set has been shown to be scenario- or dataset-dependent. In this paper, to mimic the human sentence composition process using a neural network approach, we propose to incorporate different categories of linguistic features into distributed representation of words in order to learn simultaneously the writing style representations based on unlabeled texts for AA. In particular, the proposed models allow topical, lexical, syntactical, and character-level feature vectors of each document to be extracted as stylometrics. We evaluate the performance of our approach on the problems of authorship characterization, authorship identification and authorship verification with the Twitter, blog, review, novel, and essay datasets. The experiments suggest that our proposed text representation outperforms the static stylometrics, dynamic n -grams, latent Dirichlet allocation, latent semantic analysis, distributed memory model of paragraph vectors, distributed bag of words version of paragraph vector, word2vec representations, and other baselines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning Stylometric Representations for Authorship Analysis.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cybernetics

Lead the way for us

Journal: IEEE Transactions on Cybernetics	Publication Date: Nov 21, 2017
Citations: 128

Similar Papers

Explainable Authorship Identification in Cultural Heritage Applications
Mattia Setzu ... Anna Monreale
Journal on Computing and Cultural Heritage | VOL. 17
Mattia Setzu, et. al.Mattia Setzu ... Anna Monreale
24 Jun 2024
Journal on Computing and Cultural Heritage | VOL. 17

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement
Jordan M Wheeler ... Shiyu Wang
Journal of Educational and Behavioral Statistics | VOL. 49
Jordan M Wheeler, et. al.Jordan M Wheeler ... Shiyu Wang
27 Nov 2023
Journal of Educational and Behavioral Statistics | VOL. 49

A Computational Approach Based on Syntactic Levels of Language in Authorship Attribution
Paulo Junior Varela ... Luiz Eduardo Soares Oliveira
IEEE Latin America Transactions | VOL. 14
Paulo Junior Varela, et. al.Paulo Junior Varela ... Luiz Eduardo Soares Oliveira
01 Jan 2015
IEEE Latin America Transactions | VOL. 14

Improve topic modeling algorithms based on Twitter hashtags
Hayder M Alash ... Ghaidaa A Al-Sultany
Journal of Physics: Conference Series | VOL. 1660
Hayder M Alash, et. al.Hayder M Alash ... Ghaidaa A Al-Sultany
01 Nov 2020
Journal of Physics: Conference Series | VOL. 1660

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning Stylometric Representations for Authorship Analysis.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cybernetics