An Analytic Survey on MapReduce based K-Means and its Hybrid Clustering Algorithms

Utkarsha Bagde,Priyanka Tripathi

doi:10.1109/iccmc.2018.8488104

Abstract

The challenging task of today’s era in data clustering is the common technique of arranging similar data into chunks. The traditional clustering algorithm is effective for handling large amount of data which comes from various sources such as social media, business, internet, etc. However, the time complexity of the serial calculation method is very high in these traditional algorithms. The K-Means algorithm is sensitive for initial points and local optimization and many times K-Means runs for K value. K-Harmonic Means is insensitive to the initialization of the centers and suitable for large scale datasets. To overcome these defects of traditional clustering algorithm, a hybrid method is suggested in this paper. MapReduce is a parallel programming model for distributed processing and generates data sets with a parallel, distributed algorithmic program on a cluster. In this paper, observations are given based on the different MapReduce algorithms. A new hybrid clustering algorithm based on MapReduce is proposed on those observations.

Full Text