Abstract

We evaluate a novel implementation of a Self-Organizing Map (SOM) on a Graphics Processing Unit (GPU) cluster. Using various combinations of OpenCL, CUDA, and two different graphics cards, we demonstrate the scalability of the SOM implementation on one to eight GPUs. Results indicate that while the algorithm scales well with the number of training samples and the map size, the benefits from using the data-parallel approaches offered by the GPU are severely limited when combined with the Message Passing Interface (MPI) in this setting, and comparable to speedups of GPU-based implementations as compared to optimized sequential code. Speedups achieved range from 3 to 32, for various map and training data sizes. We also observed a performance penalty for the OpenCL implementation as compared to CUDA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call