Evaluation of Clustering Algorithms on HPC Platforms

Juan M Cebrian,Jesús Soto,José M Cecilia,Baldomero Imbernón

doi:10.3390/math9172156

Abstract

Clustering algorithms are one of the most widely used kernels to generate knowledge from large datasets. These algorithms group a set of data elements (i.e., images, points, patterns, etc.) into clusters to identify patterns or common features of a sample. However, these algorithms are very computationally expensive as they often involve the computation of expensive fitness functions that must be evaluated for all points in the dataset. This computational cost is even higher for fuzzy methods, where each data point may belong to more than one cluster. In this paper, we evaluate different parallelisation strategies on different heterogeneous platforms for fuzzy clustering algorithms typically used in the state-of-the-art such as the Fuzzy C-means (FCM), the Gustafson–Kessel FCM (GK-FCM) and the Fuzzy Minimals (FM). The experimental evaluation includes performance and energy trade-offs. Our results show that depending on the computational pattern of each algorithm, their mathematical foundation and the amount of data to be processed, each algorithm performs better on a different platform.

Highlights

We show the mathematical foundations of the clustering algorithms targeted in this work, including the Fuzzy C-means (FCM), Gustafson–Kessel FCM (GK-FCM) and the Fuzzy Minimals (FM) algorithms [20,49,50]
These algorithms are well-known fuzzy clustering algorithms and they are commonly used in several applications [51]
Instruction-level parallelism is developed by the compiler, and we focus our programming efforts on thread and data-level parallelism

Summary

Related Work

Clustering methods have been studied in great depth in the literature [26]. there are very few parallel implementations that cover different computing platforms. In [37], the authors proposed a classifier for efficiently mining data streams which was based on machine learning improved by an efficient drift detector They focused on hard clustering techniques where data elements only belong to one cluster and probabilities are not provided. Demonstrated that the FCM clustering algorithm can be improved by the use of static and dynamic single-pass incremental FCM procedures They did not provide a parallel version, they pointed out this is mandatory in the future work. Téllez-Velázquez, Arturo and Cruz-Barbosa [46] introduced an inference machine architecture that processes string-based rules and concurrently executes them based on an execution plan created by a fuzzy rule scheduler This approach explored the parallel nature of rule sets, offering high speed-up ratios without losing generality. GPUs [48] and performs clustering for real environmental sensor data

Clustering Algorithms

Fuzzy C-Means

Gustafson–Kessel

Fuzzy Minimals

Baseline Implementations

CPU Optimisations

GPU Optimisations

Hardware Environment and Benchmarking

Runtime Evaluation

Energy Evaluation

Scalability with Big Datasets

Findings

Discussion

Conclusions and Future Work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mathematics	Publication Date: Sep 4, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Evaluation of Clustering Algorithms on HPC Platforms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Similar Papers

Evaluation of Clustering Algorithms on GPU-Based Edge Computing Platforms.
José M Cecilia ... Baldomero Imbernón
Sensors | VOL. 20
José M Cecilia, et. al.José M Cecilia ... Baldomero Imbernón
06 Nov 2020
Sensors | VOL. 20

Analyzing and Improving Clustering Based Sampling for Microprocessor Simulation
Yue Luo ... A Joshi
-
Yue Luo, et. al. Yue Luo ... A Joshi
24 Oct 2005
24 Oct 2005

An ensemble approach to outlier detection using some conventional clustering algorithms
Akash Saha ... Neeraj Kumar
Multimedia Tools and Applications | VOL. 80
Akash Saha, et. al.Akash Saha ... Neeraj Kumar
05 Sep 2020
Multimedia Tools and Applications | VOL. 80

Partition Clustering Techniques for Big LIDAR Dataset
Ahmad Q Al Shami
-
Ahmad Q Al ShamiAhmad Q Al Shami
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of Clustering Algorithms on HPC Platforms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematics