MapReduce-Based Crow Search-Adopted Partitional Clustering Algorithms for Handling Large-Scale Data

Karthikeyani Visalakshi N Karthikeyani Visalakshi N,Shanthi S Shanthi S,Lakshmi K Lakshmi K

doi:10.4018/ijcini.20211001.oa32

Karthikeyani Visalakshi N Karthikeyani Visalakshi N, Shanthi S Shanthi S + Show 1 more

Open Access

https://doi.org/10.4018/ijcini.20211001.oa32

Copy DOI

Abstract

Cluster analysis is the prominent data mining technique in knowledge discovery and it discovers the hidden patterns from the data. The K-Means, K-Modes and K-Prototypes are partition based clustering algorithms and these algorithms select the initial centroids randomly. Because of its random selection of initial centroids, these algorithms provide the local optima in solutions. To solve these issues, the strategy of Crow Search algorithm is employed with these algorithms to obtain the global optimum solution. With the advances in information technology, the size of data increased in a drastic manner from terabytes to petabytes. To make proposed algorithms suitable to handle these voluminous data, the phenomena of parallel implementation of these clustering algorithms with Hadoop Mapreduce framework. The proposed algorithms are experimented with large scale data and the results are compared in terms of cluster evaluation measures and computation time with the number of nodes.

Highlights

Clustering is the unsupervised classification technique that extracts useful knowledge from the data without knowing their class labels
The silhouette values obtained from various iterations for the Parallel CSAKMeans algorithm show that the proposed clustering algorithm outperforms than Parallel K-Means and Parallel PSOK-Means for all data sets
It is observed that the results of the Silhouette, F-Measure, Rand Index and Purity reveal that the Parallel CSAK-Means are higher than the Parallel K-Means and Parallel PSOK-Means clustering algorithms

Summary

Introduction

Clustering is the unsupervised classification technique that extracts useful knowledge from the data without knowing their class labels. The K-Means, K-Modes and K-Prototypes are partition based clustering algorithms and these algorithms handle the numeric, categorical and mixing of numeric and categorical data objects respectively. K-Means is one of the most widely used partitional clustering algorithms to handle numerical data This algorithm is extended to handle the categorical, mixed numeric and categorical types of data. These algorithms are called as K-Modes and K-Prototypes (Huang, 1998, 1997). The authors suggested that each optimization algorithm has its own parameters and it is tedious to fix optimum values for these parameters This algorithm can be extended to automatically determine the optimal number of clusters for datasets

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MapReduce-Based Crow Search-Adopted Partitional Clustering Algorithms for Handling Large-Scale Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Cognitive Informatics and Natural Intelligence

Lead the way for us

Journal: International Journal of Cognitive Informatics and Natural Intelligence	Publication Date: Jul 29, 2021
License type: CC BY 3.0

Similar Papers

MapReduce Based Crow Search Adopted Partitional Clustering Algorithms For Handling Large Scale Data
-
International Journal of Cognitive Informatics and Natural Intelligence | VOL. 15
--
01 Oct 2021
International Journal of Cognitive Informatics and Natural Intelligence | VOL. 15

Data mining and knowledge discovery in materials science and engineering: A polymer nanocomposites case study
O Abuomar ... C.U Pittman
Advanced Engineering Informatics | VOL. 27
O Abuomar, et. al.O Abuomar ... C.U Pittman
05 Sep 2013
Advanced Engineering Informatics | VOL. 27

A novel clustering framework using farthest neighbour approach
Suvendu Kanungo ... Aparna Shukla
-
Suvendu Kanungo, et. al.Suvendu Kanungo ... Aparna Shukla
01 May 2017
01 May 2017

Improving the Initial Centroids of k-means Clustering Algorithm to Generalize its Applicability
M Goyal ... S Kumar
Journal of The Institution of Engineers (India): Series B | VOL. 95
M Goyal, et. al.M Goyal ... S Kumar
08 Jul 2014
Journal of The Institution of Engineers (India): Series B | VOL. 95

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MapReduce-Based Crow Search-Adopted Partitional Clustering Algorithms for Handling Large-Scale Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Cognitive Informatics and Natural Intelligence