Abstract

Clustering is one of the main task of data mining, where groups of similar objects are discovered and grouping of similar data as well as outliers detection are performed. Processing of huge datasets requires scalable models of computations and distributed computing environments, therefore efficient parallel clustering methods are required for this purpose. Usually for parallel data analytics the MapReduce processing model is used. But growing computer power of heterogeneous platforms based on graphic processors and FPGA accelerators causes that CUDA and OpenCL models may be interesting alternative to MapReduce. This paper presents comparative analysis of effectiveness of applying MapReduce and CUDA/OpenCL processing models for clustering. We compare different methods of clustering in terms of their possibilities of parallelization using both models of computation. The conclusions indicate directions for further work in this area.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call