Abstract

Parallel clustering serves as a platform for handling big data. The literature displays a number of clustering algorithms using a map-reduce framework, but they did not assure the effective clusters such that knowledge extraction becomes tough. With the aim to render a better and effective data clustering method to analyze the big data arriving from distributed systems, this paper uses a new clustering method. The proposed method named as BatDolphin-based Sparse Fuzzy C-Means (BatDol-Sparse FCM) clustering algorithm is proposed that paves way for the optimal selection of the cluster centroids. The distributed big data is managed using the Map-Reduce framework that is inbuilt with the BatDolphin-based Sparse Fuzzy C-Means algorithm such that the local and global clustering is executed. The implementation of the proposed BatDol-Sparse FCM algorithm is done by using data from the E-khool Learning Management System (LMS) and three medical datasets from the UCI repository. The analysis is performed using the metrics, such as clustering accuracy (CA) and dice coefficient (DC). From the simulation results, it is evident that the proposed parallel clustering scheme provided better results than the existing algorithms with the values of 0.96 and 0.9667 for CA and DC respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.