Автоматическое определение половой принадлежности автора текста: феномен русской женской прозы

Anastasiya B Khazova

doi:10.25205/1818-7935-2020-18-1-22-32

Abstract

The article deals with the method of automatic detection of authors’ gender identity on the material of fiction prose of 1980–2000. During this period, there is a special construct, called “women’s prose”, which is characterized by a special genre and stylistic originality. We set ourselves the task to find out whether the concept of “women’s prose” refers only to the non-text reality or is clearly reflected at the level of language. We have collected corpus of texts 1980–2000 and conducted that identified the most effective machine learning algorithms for the classification of male and female prose. This research focuses on methods for automatically determining the gender identity of authors on the material of prose from 1960 to 2000. The purpose of this work is to identify optimal methods for automatically determining the gender identity of the authors. The objectives of this study include highlighting the grammatical and stylistic features of prose from 1960 to 2000 and, in particular, women's prose and texts of 18th – 19th centuries; tracing the changes in the distribution of usage different parts of speech and punctuation for a specified period and conducting an experiment to identify the most effective algorithm for the classification of literary texts by using machine learning. The analysis revealed that women and men often use in their texts the following parts of speech: nouns, verbs, prepositions, pronominal nouns, conjunctions, and adjectives that reflects the specific artistic style. In addition, analysis was made of the use of the most commonly used punctuation marks from the given list: question mark, exclamation point, comma, colon, semicolon, period, comma. It has been observed that women are more actively using the means of punctuation as a means of expression in modern literature: the share of the use of exclamation, question marks and commas the writers is much higher than the value obtained through the analysis of men’s texts. The work also contains an analysis of the distribution of parts of speech and punctuation of literary texts of men and women of 18th – 19th centuries. We performed experiment to identify the most effective algorithm for determining the gender identity of the author. It was found that the most effective classifiers of literature are the implementation of algorithms as BayesNet and SMO.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Автоматическое определение половой принадлежности автора текста: феномен русской женской прозы

Abstract

Talk to us

Similar Papers

More From: NSU Vestnik. Series: Linguistics and Intercultural Communication

Lead the way for us

Similar Papers

Automatic Detection of Gender Identity: The Phenomenon of Russian Women's Prose
Anastasiya Khazova
SSRN Electronic Journal | VOL. -
Anastasiya KhazovaAnastasiya Khazova
14 Dec 2018
SSRN Electronic Journal | VOL. -

<i>Punctuation: Art, Politics, and Play</i> (review)
Yung-Hsing Wu
symploke | VOL. 16
Yung-Hsing Wu Yung-Hsing Wu
01 Jan 2009
symploke | VOL. 16

Pieturzīmes senajos latviešu tekstos
Sintija Ķauķīte
Vārds un tā pētīšanas aspekti: rakstu krājums = The Word: Aspects of Research: conference proceedings | VOL. -
Sintija ĶauķīteSintija Ķauķīte
23 Nov 2021
Vārds un tā pētīšanas aspekti: rakstu krājums = The Word: Aspects of Research: conference proceedings | VOL. -

Features of idiolect in the punctuation of electronic mail
Gintarė Žalkauskaitė
Lietuvių kalba | VOL. -
Gintarė ŽalkauskaitėGintarė Žalkauskaitė
28 Dec 2011
Lietuvių kalba | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Автоматическое определение половой принадлежности автора текста: феномен русской женской прозы

Abstract

Talk to us

Similar Papers

More From: NSU Vestnik. Series: Linguistics and Intercultural Communication