Abstract

The article is devoted to the analysis of the rhythm of texts of different genres: fiction novels, advertisements, scientific articles, reviews, tweets, and political articles. The authors identified lexico-grammatical figures in the texts: anaphora, epiphora, diacope, aposiopesis, etc., that are markers of the text rhythm. On their basis, statistical features were calculated that describe quantitatively and structurally these rhythm features.The resulting text model was visualized for statistical analysis using boxplots and heat maps that showed differences in the rhythm of texts of different genres. The boxplots showed that almost all genres differ from each other in terms of the overall density of rhythm features. Heatmaps showed different rhythm patterns across genres. Further, the rhythm features were successfully used to classify texts into six genres. The classification was carried out in two ways: a binary classification for each genre in order to separate a particular genre from the rest genres, and a multi-class classification of the text corpus into six genres at once. Two text corpora in English and Russian were used for the experiments. Each corpus contains 100 fiction novels, scientific articles, advertisements and tweets, 50 reviews and political articles, i.e. a total of 500 texts. The high quality of the classification with neural networks showed that rhythm features are a good marker for most genres, especially fiction. The experiments were carried out using the ProseRhythmDetector software tool for Russian and English languages. Text corpora contains 300 texts for each language.

Highlights

  • Accepted September 1, 2021 e article is devoted to the analysis of the rhythm of texts of di erent genres: ction novels, advertisements, scienti c articles, reviews, tweets, and political articles. e authors identi ed lexico-grammatical gures in the texts: anaphora, epiphora, diacope, aposiopesis, etc., that are markers of the text rhythm

  • E resulting text model was visualized for statistical analysis using boxplots and heat maps that showed di erences in the rhythm of texts of di erent genres. e boxplots showed that almost all genres di er from each other in terms of the overall density of rhythm features

  • Heatmaps showed di erent rhythm pa erns across genres

Read more

Summary

Text Classi cation by Genre Based on Rhythm Features

E authors identi ed lexico-grammatical gures in the texts: anaphora, epiphora, diacope, aposiopesis, etc., that are markers of the text rhythm. Accepted September 1, 2021 e article is devoted to the analysis of the rhythm of texts of di erent genres: ction novels, advertisements, scienti c articles, reviews, tweets, and political articles. E resulting text model was visualized for statistical analysis using boxplots and heat maps that showed di erences in the rhythm of texts of di erent genres. E high quality of the classi cation with neural networks showed that rhythm features are a good marker for most genres, especially ction. МОДЕЛИРОВАНИЕ И АНАЛИЗ ИНФОРМАЦИОННЫХ СИСТЕМ, ТОМ 28, No 3, 2021 сайт журнала: www.mais-journal.ru

Классификация текстов по жанрам на основе ритмических характеристик
Надежда Станиславовна Лагутина
Обзор смежных работ
Ритмические характеристики
Статистический анализ ритмических характеристик
Классификация по жанрам
Russian language текстов по жанрам для русского языка
Обсуждение результатов с лингвистической точки зрения
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call