Abstract

Recently, the framework is proposed with techniques such as random subspace and constraint propagation for handling high dimensional data ensemble clustering. Huge dataset clustering is difficult for conventional sequential clustering methods since it requires higher computation time. Distributed parallel processing and methods are consequently useful towards attaining results and scalability constraints of clustering huge datasets. So, in this work, the parallel progressive-based inductive subspace ensemble clustering (PPISEC) algorithm is introduced with the concept of MapReduce (MR) to perform high dimensional data clustering. Depending on micro-clusters and correspondence relative, the clustering method is designed which is easily parallelised via MR and completed in moderately with a small number of MR rounds. However, in the PPISEC algorithm, the centroid values are selected with the help of an improved support vector machine (ISVM) classifier. Thus, the incremental ensemble member chosen (IEMC) progression is performed with fuzzy-based firefly algorithm (FFA), and the normalised cut algorithm is established to accomplish high dimensional data clustering. The outcome shows that, the proposed PPISEC framework, which performs well on three benchmark samples by utilising high dimensionality and enhanced the results than the conventional clustering ensemble approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call