Abstract

Parallel clustering serves as a platform for handling big data. The literature displays a number of clustering algorithms using a map-reduce framework, but they did not assure the effective clusters such that knowledge extraction becomes tough. With the aim to render a better and effective data clustering method to analyze the big data arriving from distributed systems, this paper uses a new clustering method. The proposed method named as BatDolphin-based Sparse Fuzzy C-Means (BatDol-Sparse FCM) clustering algorithm is proposed that paves way for the optimal selection of the cluster centroids. The distributed big data is managed using the Map-Reduce framework that is inbuilt with the BatDolphin-based Sparse Fuzzy C-Means algorithm such that the local and global clustering is executed. The implementation of the proposed BatDol-Sparse FCM algorithm is done by using data from the E-khool Learning Management System (LMS) and three medical datasets from the UCI repository. The analysis is performed using the metrics, such as clustering accuracy (CA) and dice coefficient (DC). From the simulation results, it is evident that the proposed parallel clustering scheme provided better results than the existing algorithms with the values of 0.96 and 0.9667 for CA and DC respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call