Kernelized Spectral Clustering based Conditional MapReduce function with big data

K Maheswari,M Ramakrishnan

doi:10.1080/1206212x.2019.1587892

Abstract

Clustering is the significant data mining technique for big data analysis, where large volume data are grouped. The resulting of clustering is to minimize the dimensionality while accessing large volume of data. The several data mining techniques have been developed for clustering the data. But the problem of clustering becomes increasing rapidly in recent years since the existing clustering algorithm failed to minimize the clustering time and majority of techniques require huge memory to perform clustering task. In order to improve clustering accuracy and minimize the dimensionality, a Kernelized Spectral Clustering based Conditional Maximum Entropy MapReduce (KSC-CMEMR) technique is introduced. The number of data is collected from big dataset. The KSC-CMEMR technique partitions the data into different clusters using Kernelized Spectral Clustering Process based on the spectrum of similarity matrix and to perform dimensionality reduction. Based on the similarity, the Kernelized Spectral Clustering is carried out with higher clustering accuracy. After that, Conditional Maximum Entropy MapReduce model eliminates the irrelevant data present in the cluster. The designed model predicts the maximum probabilities of data become a member of the cluster and remove the irrelevant data from the cluster. This helps to reduce the false positive and space complexity. Experimental evaluation is carried out with certain parameters such as clustering accuracy, clustering time, false positive rate, and space complexity with respect to the number of data. The experimental results reported that the proposed KSC-CMEMR technique obtains high clustering accuracy with minimum time as well as space complexity.

Full Text