Parallel implementation of the greedy heuristic clustering algorithms

L A Kazakovtsev,I P Rozhnov,M V Karaseva,A A Stupina,E A Popov

doi:10.1088/1757-899x/537/2/022052

L A Kazakovtsev, I P Rozhnov + Show 3 more

Open Access

https://doi.org/10.1088/1757-899x/537/2/022052

Copy DOI

Abstract

Authors propose parallel greedy heuristic k-means clustering algorithms for implementation on the graphical processing units (GPU) for solving large-scale problems. The computational experiments illustrate high performance of the GPUs in comparison with running the greedy heuristic algorithms on a central processor unit which is especially significant in the case of big datasets and bug numbers of clusters. The efficiency of the greedy heuristic algorithms in comparison with the standard k-means algorithm remains.

Highlights

IntroductionAutomatic grouping (clustering) systems become increasingly widespread due to the expansion of the application area of data analysis problems such as image recognition, solution of diagnostic problems in medicine, marketing research, Internet traffic research, etc. [1,2,3]
Automatic grouping systems become increasingly widespread due to the expansion of the application area of data analysis problems such as image recognition, solution of diagnostic problems in medicine, marketing research, Internet traffic research, etc. [1,2,3].The k-means problem, along with a very similar p-median problem, is one of the classical problems of location theory [4]
The computational experiments illustrate high performance of the graphical processing units (GPU) in comparison with running the greedy heuristic algorithms on a central processor unit which is especially significant in the case of big datasets and bug numbers of clusters

Summary

Introduction

Automatic grouping (clustering) systems become increasingly widespread due to the expansion of the application area of data analysis problems such as image recognition, solution of diagnostic problems in medicine, marketing research, Internet traffic research, etc. [1,2,3]. The aim of our study is to improve the accuracy of the result of solving the k-means problem to obtain the most accurate (by the value of the objective function) and a stable result, for a fixed, limited time, with the use of modern parallel GPU (Graphical Processor Unit) systems. Ways of merging solutions may be different One of such ways is elementwise merging [8]: Algorithm 3 Greedy Procedure #1 Required: two “parent” sets (arrays) of cluster centers S’={X’1,...,X’k} and S’’={X’’1,...,X’’k}. The idea of this work is to implement the algorithms of the Greedy Heuristics Method using GPU systems [8] and investigate their properties when solving problems of high dimensionality. This places less demands on flow control, and the high density of calculations and large amounts of data eliminate the need for large caches which are rather efficient on a CPU

Parallel Implementation of Greedy Algorithms

Objective function value

Conclusions