Automatic Identification of Authors’ Stylistics and Gender on the Basis of the Corpus of Russian Fiction Using Extended Set-theoretic Model with Collocation Extraction

Alexandr Osochkin,Xenia Piotrowska,Vladimir Fomin Fomin

doi:10.53482/2021_50_389

Abstract

We present a novel quantitative approach for classification of authors' stylistics and gender differences based on extraction of word collocation. The proposed algorithm attenuates previously described issues of text processing using the vector models. We demonstrate the approach by analyzing a corpus of Russian prose. We discuss different approaches for classification and identification of the author's style implemented by currently-available software solutions and libraries of morphological analysis, methods of parameterization, indexing of texts, artificial intelligence algorithms and knowledge extraction. Our results demonstrate the efficiency and relative advantage of regression decision tree methods in identifying informative frequency indexes in a way that lends itself to their logical interpretation. We develop a toolkit for conducting comparative experiments to assess the effectiveness of classification of natural language text data, using vector, set-theoretic and the author's set-theoretic with collocation extraction models of text representation. Comparing the ability of different methods to identify the style and gender differences of authors of fiction works, we find that the proposed approach incorporating collocation information alleviates some of the previously identified deficiencies and yields overall improvements in the classification accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic Identification of Authors’ Stylistics and Gender on the Basis of the Corpus of Russian Fiction Using Extended Set-theoretic Model with Collocation Extraction

Abstract

Talk to us

Similar Papers

More From: Glottometrics

Lead the way for us

Similar Papers

The application of artificial intelligence and custom algorithms with inertial wearable devices for gait analysis and detection of gait-altering pathologies in adults: A scoping review of literature.
Ashley Cha Yin Lim ... R Dineth Fonseka
DIGITAL HEALTH | VOL. 8
Ashley Cha Yin Lim, et. al.Ashley Cha Yin Lim ... R Dineth Fonseka
01 Jan 2021
DIGITAL HEALTH | VOL. 8

Development and Validation of an Authorial Identity Model and Questionnaire: A Factor Analytic Approach
...
-
, et. al. ...
01 Dec 2019
01 Dec 2019

Stylometry Metrics Selection for Creating a Model for Evaluating the Writing Style of Authors According to Their Cultural Orientation
Madalina Zurini
Informatica Economica | VOL. 19
Madalina ZuriniMadalina Zurini
30 Sep 2015
Informatica Economica | VOL. 19

Novel artificial intelligence algorithm for automatic detection of COVID-19 abnormalities in computed tomography images
Abhishek Mahajan ... Ml V Apparao
Cancer Research, Statistics, and Treatment | VOL. 4
Abhishek Mahajan, et. al.Abhishek Mahajan ... Ml V Apparao
01 Jan 2020
Cancer Research, Statistics, and Treatment | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Identification of Authors’ Stylistics and Gender on the Basis of the Corpus of Russian Fiction Using Extended Set-theoretic Model with Collocation Extraction

Abstract

Talk to us

Similar Papers

More From: Glottometrics