Abstract

The paper proposes a system which is electronic data storage (of qualification works of students from different countries) and provides the capability to identify and connect young scientists conducting research on a related problem area. The purpose of developing this system is to provide opportunities for knowledge exchange, research in a team on a common problem, as well as to identify scientific trends in different countries. In this paper, the preprocessing methods influence on the work of classifiers such as Logistic Regression, LSTM, BERT, LightGBM was researched. A study was conducted on the speed of classification and F1 assessment. Conclusions. Lemmatization showed to require a shorter operating time compared to steaming by almost twice and a better score by an average of 5 percent, so it was decided to use the Logistic Regression classifier with lemmatization at the stage of text preparation in the subsequent operation of the proposed ISKE.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call