Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

Aleksandr Romanov,Konstantin Lomotin,Ekaterina Kozlova

doi:10.5334/dsj-2019-037

Aleksandr Romanov, Konstantin Lomotin + Show 1 more

Open Access

https://doi.org/10.5334/dsj-2019-037

Copy DOI

Abstract

This work is devoted to the study of applicability of modern methods of machine learning to the task of automatic classification of scientific articles and abstracts. For this purpose, the study of such models of machine learning as artificial neural networks, random forest, logistic regression, and support vector machine was carried out with taking into account such a feature of scientific texts as a large number of terms specific for various categories. Separately, the stages of data collection and extraction of text characteristics are considered. The results of research are used in development of a decision support system for assignment of scientific texts to the code of the department or abstract journal of All-Russian Institute of Scientific and Technical Information of Russian Academy of Sciences.

Highlights

IntroductionIn order to be able to meet this challenge, algorithms of machine learning (such as supervised learning algorithms) are applied
The problem of automatic classification of texts is becoming increasingly required due to the growing amount of textual information stored on the Internet
This study is aimed at developing a model capable to determine the probability of text belonging to a category of a certain rubricator, i.e. to work in a Decision Support System (DSS) mode

Summary

Introduction

In order to be able to meet this challenge, algorithms of machine learning (such as supervised learning algorithms) are applied. For their setting, they require a set of marked data already having a class label. The work is carried out as part of development of a text analysis system for All-Russian Institute of Scientific and Technical Information of Russian Academy of Sciences (VINITI RAS) (Viniti.ru, 2019). Documents go through thematic departments, where specialists assign them codes of topics in various systems of classification. In this case, the number of codes of abstract journals and State Rubricator of Scientific and Technical Information (SRSTI) reaches several hundred. The use of DSS is intended to reduce the number of possible topics for the text providing the specialist with an estimate of the probabilities for each rubric

Objectives

Methods

Findings

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data Science Journal	Publication Date: Aug 12, 2019
Citations: 19	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science Journal

Lead the way for us

Similar Papers

Multi-Class brain normality and abnormality diagnosis using modified Faster R-CNN
Kübra Uyar ... Hüseyin Kasap
International Journal of Medical Informatics | VOL. 155
Kübra Uyar, et. al.Kübra Uyar ... Hüseyin Kasap
16 Sep 2021
International Journal of Medical Informatics | VOL. 155

Landslide susceptibility assessment using feature selection-based machine learning models
...
Geomechanics and Engineering | VOL. 25
, et. al. ...
01 Jan 2020
Geomechanics and Engineering | VOL. 25

How platinum-induced nephrotoxicity occurs? Machine learning prediction in non-small cell lung cancer patients
Shih-Hui Huang ... Hsiang-Yin Chen
Computer Methods and Programs in Biomedicine | VOL. 221
Shih-Hui Huang, et. al.Shih-Hui Huang ... Hsiang-Yin Chen
26 Apr 2022
Computer Methods and Programs in Biomedicine | VOL. 221

Machine Learning Models for Predicting Mortality in 7472 Very Low Birth Weight Infants Using Data from a Nationwide Neonatal Network.
Hyun Jeong Do ... Kyoung Min Moon
Diagnostics | VOL. 12
Hyun Jeong Do, et. al.Hyun Jeong Do ... Kyoung Min Moon
03 Mar 2022
Diagnostics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science Journal