Abstract

Clustering algorithms are one of the most widely used kernels to generate knowledge from large datasets. These algorithms group a set of data elements (i.e., images, points, patterns, etc.) into clusters to identify patterns or common features of a sample. However, these algorithms are very computationally expensive as they often involve the computation of expensive fitness functions that must be evaluated for all points in the dataset. This computational cost is even higher for fuzzy methods, where each data point may belong to more than one cluster. In this paper, we evaluate different parallelisation strategies on different heterogeneous platforms for fuzzy clustering algorithms typically used in the state-of-the-art such as the Fuzzy C-means (FCM), the Gustafson–Kessel FCM (GK-FCM) and the Fuzzy Minimals (FM). The experimental evaluation includes performance and energy trade-offs. Our results show that depending on the computational pattern of each algorithm, their mathematical foundation and the amount of data to be processed, each algorithm performs better on a different platform.

Highlights

  • We show the mathematical foundations of the clustering algorithms targeted in this work, including the Fuzzy C-means (FCM), Gustafson–Kessel FCM (GK-FCM) and the Fuzzy Minimals (FM) algorithms [20,49,50]

  • These algorithms are well-known fuzzy clustering algorithms and they are commonly used in several applications [51]

  • Instruction-level parallelism is developed by the compiler, and we focus our programming efforts on thread and data-level parallelism

Read more

Summary

Related Work

Clustering methods have been studied in great depth in the literature [26]. there are very few parallel implementations that cover different computing platforms. In [37], the authors proposed a classifier for efficiently mining data streams which was based on machine learning improved by an efficient drift detector They focused on hard clustering techniques where data elements only belong to one cluster and probabilities are not provided. Demonstrated that the FCM clustering algorithm can be improved by the use of static and dynamic single-pass incremental FCM procedures They did not provide a parallel version, they pointed out this is mandatory in the future work. Téllez-Velázquez, Arturo and Cruz-Barbosa [46] introduced an inference machine architecture that processes string-based rules and concurrently executes them based on an execution plan created by a fuzzy rule scheduler This approach explored the parallel nature of rule sets, offering high speed-up ratios without losing generality. GPUs [48] and performs clustering for real environmental sensor data

Clustering Algorithms
Fuzzy C-Means
Gustafson–Kessel
Fuzzy Minimals
Baseline Implementations
CPU Optimisations
GPU Optimisations
Hardware Environment and Benchmarking
Runtime Evaluation
Energy Evaluation
Scalability with Big Datasets
Findings
Discussion
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.