Abstract

Clustering algorithms group a dataset into clusters that have common features. Clustering has applications in computer vision, data mining, market segmentation etc. The k-means clustering algorithm is one of the most popular algorithms where the mean is used as a prototype of the cluster. In this paper, we explore accelerating the performance of k-means clustering using NVIDIA Graphics Processing Units (GPUs) programmed with CUDA C. Different optimization techniques are applied such as the use of shared memory for image data and the use of constant memory for cluster data. The performance results are evaluated on a range of images from small ($256\times 256$ pixels) to large ($1024\times 1024$ pixels) and number of clusters range from 4 to 256. We find that on an average, the parallel implementation has a 9x speed up as compared to the sequential version for 4 clusters. The speedup increases to 57x as number of clusters increase to 256. This implementation also performs better than a reference implementation from Northwestern University/UC Berkeley.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call