Ensemble based Distributed K-Modes Clustering

Ijerd ,Karthikeyani Visalakshi N

doi:10.6084/m9.figshare.1372349.v1

Abstract

Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Ensemble based Distributed K-Modes Clustering

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Applications of clustering algorithms and self organizing maps as data mining and business intelligence tools on real world data sets
L Singh ... P K Dubey
-
L Singh, et. al.L Singh ... P K Dubey
01 Dec 2010
01 Dec 2010

An iterative initial-points refinement algorithm for categorical data clustering
Ying Sun ... Zhengxin Chen
Pattern Recognition Letters | VOL. 23
Ying Sun, et. al.Ying Sun ... Zhengxin Chen
06 Dec 2001
Pattern Recognition Letters | VOL. 23

Initialization of K-modes clustering using outlier detection techniques
Feng Jiang ... Yuefei Sui
Information Sciences | VOL. 332
Feng Jiang, et. al.Feng Jiang ... Yuefei Sui
06 Nov 2015
Information Sciences | VOL. 332

A Primer on Machine Learning.
Audrene S Edwards ... Tun Jie
Transplantation | VOL. 105
Audrene S Edwards, et. al.Audrene S Edwards ... Tun Jie
18 Aug 2020
Transplantation | VOL. 105

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ensemble based Distributed K-Modes Clustering

Abstract

Talk to us

Similar Papers