METHODS OF CLASSIFICATION OF MACHINE LEARNING FOR CONSTRUCTION OF MATHEMATICAL MODELS ON MULTIMODAL DATA

N Boyko,O Petrovskyi

doi:10.31891/2307-5732-2022-307-2-25-32

Abstract

This article is dedicated to topic modeling as an unsupervised machine learning technique. It is analyzed how it seems possible to determine the topics of documents in order to categorize them further with the help of topic modeling methods. Such methods as latent semantic analysis, probabilistic latent semantic analysis and latent Dirichlet allocation are considered. An approach that allows the construction of effective topic models of text document collections in Ukrainian and other synthetic languages based on peculiarities of this linguistic language type is proposed, and its main stages are described. The proposed approach consists of a custom input data preprocessing pipeline, which covers file loading, text extraction, removal of improper symbols, tokenization, removal of stop-words, stemming of each token and a newly introduced model pruning stage, which makes any of the modern topic modeling methods applicable for synthetic language topic modeling. The approach was implemented in Python programming language and used to obtain the topic model of the collection of Ukrainian-language scientific publications on civic identity and related topics. An expert in political psychology, who studies the phenomenon of civic identity, was involved in the research for the topic model quality evaluation. As a result of expert evaluation of the topics singled out during the modeling, it was proposed to clarify the formulation of cluster names based on the semantics of the sets of words that form them. In general, according to the expert, the topics singled out represent the concept of the civic identity of an individual and will allow researchers to simplify the work with literature sources on this issue when used to categorize documents. This demonstrates the efficiency of the proposed approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

METHODS OF CLASSIFICATION OF MACHINE LEARNING FOR CONSTRUCTION OF MATHEMATICAL MODELS ON MULTIMODAL DATA

Abstract

Talk to us

Similar Papers

More From: Herald of Khmelnytskyi National University. Technical sciences

Lead the way for us

Similar Papers

Latent Dirichlet Allocation - An approach for topic discovery
Astha Goyal ... Indu Kashyap
-
Astha Goyal, et. al.Astha Goyal ... Indu Kashyap
26 May 2022
26 May 2022

Evaluation of clustering and topic modeling methods over health-related tweets and emails
Juan Antonio Lossio-Ventura ... Jiang Bian
Artificial Intelligence in Medicine | VOL. 117
Juan Antonio Lossio-Ventura, et. al.Juan Antonio Lossio-Ventura ... Jiang Bian
07 May 2021
Artificial Intelligence in Medicine | VOL. 117

An Overview of Topic Representation and Topic Modelling Methods for Short Texts and Long Corpus
D Yamunathangam ... G Shobana
-
D Yamunathangam, et. al.D Yamunathangam ... G Shobana
08 Oct 2021
08 Oct 2021

A Survey of Topic Modeling in Text Mining
Rubayyi Alghamdi ... Khalid Alfalqi
International Journal of Advanced Computer Science and Applications | VOL. 6
Rubayyi Alghamdi, et. al.Rubayyi Alghamdi ... Khalid Alfalqi
01 Jan 2015
International Journal of Advanced Computer Science and Applications | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

METHODS OF CLASSIFICATION OF MACHINE LEARNING FOR CONSTRUCTION OF MATHEMATICAL MODELS ON MULTIMODAL DATA

Abstract

Talk to us

Similar Papers

More From: Herald of Khmelnytskyi National University. Technical sciences