High Performance Data Clustering: A Comparative Analysis of Performance for GPU, RASC, MPI, and OpenMP Implementations.

Luobin Yang,Michael A Thomas,Wei-Keng Liao,Steve C Chiu

doi:10.1007/s11227-013-0906-y

Abstract

Compared to Beowulf clusters and shared-memory machines, GPU and FPGA are emerging alternative architectures that provide massive parallelism and great computational capabilities. These architectures can be utilized to run compute-intensive algorithms to analyze ever-enlarging datasets and provide scalability. In this paper, we present four implementations of K-means data clustering algorithm for different high performance computing platforms. These four implementations include a CUDA implementation for GPUs, a Mitrion C implementation for FPGAs, an MPI implementation for Beowulf compute clusters, and an OpenMP implementation for shared-memory machines. The comparative analyses of the cost of each platform, difficulty level of programming for each platform, and the performance of each implementation are presented.

Full Text