Development of Tools for Creating Parallel Data Mining Algorithms

Karshiyev Zaynidin Karshiyev Zaynidin,Sattarov Mirzabek Sattarov Mirzabek

doi:10.37934/araset.39.1.2642

Karshiyev Zaynidin Karshiyev Zaynidin, Sattarov Mirzabek Sattarov Mirzabek

Open Access

https://doi.org/10.37934/araset.39.1.2642

Copy DOI

Abstract

The purpose of this work is to develop tools for building parallel data mining algorithms for execution in a distributed environment. A formal model of a data mining algorithm is proposed, characterized by a representation of the algorithm in the form of a set of independent operations that change the state of the knowledge model and structural blocks that allow modifying the structure of the algorithm, including for parallel execution. A method is proposed for creating parallel algorithms for data mining, in contrast to existing ones, using a decomposition of the algorithm into thread-safe functional blocks and allowing parallelization, both by changing the structure of the parallel algorithm and by configuring its execution. A methodology is proposed for parallelizing data mining algorithms, which differs from those known in that the proposed method of creating parallel data mining algorithms taking into account the characteristics of a distributed environment is applied to sequential analysis algorithms. To create parallel data mining algorithms, software templates built on the basis of a formal model and separating the implementation of the algorithm from distributed execution tools are proposed. A library of parallel data mining algorithms has been developed for execution in a distributed environment, including the proposed templates.

Full Text