Abstract

K-Means clustering is a popular unsupervised machine learning method which has been used in diverse applications including image processing, information retrieval, social sciences and weather forecasting. However, clustering is computationally expensive especially when applied to large datasets. In this paper, we explore accelerating the performance of K-means clustering using three approaches: 1) shared memory using OpenMP, 2) distributed memory with message passing (MPI), and 3) heterogeneous computing with NVIDIA Graphics Processing Units (GPUs) programmed with CUDA-C. While others have looked at accelerating K-means clustering, this is the first study that compares these different approaches. In addition, K-means performance is very sensitive to the initial means chosen. We evaluate different initializations in parallel and choose the best one to use for the entire algorithm. We evaluate results on a range of images from small (300×300 pixels) to large (1164×1200 pixel). Our results show that all three parallel programming approaches give speed-up, with the best results obtained by OpenMP for smaller images and CUDA-C for larger ones. Each of these approaches gives approximately thirty times overall speed-up compared to a sequential implementation of K-means. In addition, our parallel initialization gives an additional 1.5 to 2.5 times speed-up over the accelerated parallel versions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call