Abstract

Large dataset clustering is a major research issue due to huge processing time. Clustering methods goal at split large dataset into number of groups and each group consists of similar data points. Numerous clustering methods such as partition-based clustering, hierarchical clustering, density-based clustering, spectral clustering, and subspace clustering are presented. These methods failed to produce true (accurate) clusters in less response time. To mitigate the issues of existing methods, in this paper, we propose a novel Map-Scan-Reduce based density peaks (DP) clustering approach to cluster the large datasets. MapReduce is a popular distributed processing framework that has several advantages: it has the ability to resolve any issues that arise with large data volume and it partitioned data in a distributed way, but native MapReduce has certain drawbacks including high communication and computation overhead. In this paper, the Map-Scan-Reduce process is divided into three steps: MAP, SCAN, and REDUCE, which solve the problems of native MapReduce. Users scheduling and data preprocessing is implemented using adaptive neuro fuzzy scheduler and improved version of M-Z-D-S (Max–Min, Z-score, and decimal scaling). Furthermore, clusters privacy is protected using differential privacy method and clusters quality is validated using two matrices such as Silhoutte and Dunn for inter-cluster and intra-cluster validations, respectively. Finally, we conduct experiments to analyze the performance of the proposed work in terms of clustering accuracy, speedup ratio (execution time), and efficiency (precision, adjusted Rand index, and normalized mutual information).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.