Методика определения возраста автора текста на основе метрик удобочитаемости и лексического разнообразия

A A Sobolev,A S Romanov,A M Fedotova,A A Shelupanov,A V Kurtukova

doi:10.21293/1818-0442-2022-25-2-45-52

Abstract

The article describes the approaches to determining the age of the author of an anonymous text written in Russian. The fundamental works of the subject area are considered, both proven approaches (support vector machine, naive Bayes classifier, convolutional and recurrent neural networks) and modern methods (fastText, BERT) are implemented. The study used its own data set containing 1,5 million comments from social media users. A separate experiment is devoted to assessing the impact on the classification accuracy of various text vectorization methods. As a result of a series of experiments aimed at evaluating the efficiency of the methods used and selecting informative features, a model was obtained that can predict the age of the author of an anonymous text with an accuracy of 83.2%.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Методика определения возраста автора текста на основе метрик удобочитаемости и лексического разнообразия

Abstract

Talk to us

Similar Papers

More From: Proceedings of Tomsk State University of Control Systems and Radioelectronics

Lead the way for us

Journal: Proceedings of Tomsk State University of Control Systems and Radioelectronics	Publication Date: Jan 1, 2022
Citations: 2

Similar Papers

Time series classification for the prediction of dialysis in critically ill patients using echo statenetworks
Femke Ongenae ... Johan Decruyenaere
Engineering Applications of Artificial Intelligence | VOL. 26
Femke Ongenae, et. al.Femke Ongenae ... Johan Decruyenaere
17 Oct 2012
Engineering Applications of Artificial Intelligence | VOL. 26

Personality Identification from Social Media Using Deep Learning: A Review
S Bhavya ... Anitha S Pillai
-
S Bhavya, et. al.S Bhavya ... Anitha S Pillai
28 Nov 2019
28 Nov 2019

Transformative Progress in Document Digitization: An In-Depth Exploration of Machine and Deep Learning Models for Character Recognition
Ali Benaissa ... Abdelkhalak Bahri
Data and Metadata | VOL. 2
Ali Benaissa, et. al.Ali Benaissa ... Abdelkhalak Bahri
27 Dec 2023
Data and Metadata | VOL. 2

A Systematic Analysis and Review on Intrusion Detection Systems Using Machine Learning and Deep Learning Algorithms
Sneha Leela Jacob ... Parveen Sultana Habibullah
Journal of Computational and Cognitive Engineering | VOL. -
Sneha Leela Jacob, et. al.Sneha Leela Jacob ... Parveen Sultana Habibullah
04 Jul 2024
Journal of Computational and Cognitive Engineering | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Методика определения возраста автора текста на основе метрик удобочитаемости и лексического разнообразия

Abstract

Talk to us

Similar Papers

More From: Proceedings of Tomsk State University of Control Systems and Radioelectronics