Abstract

<span lang="EN-US">Large datasets have become useful in data mining for processing, storing, and handling vast amounts of data. However, handling and processing large datasets is time-consuming and memory intensive. As a result, the researchers adopted a partitioning strategy to improve controllability and performance and reduce the time and memory required to handle large datasets. Unfortunately, the numerous clustering techniques available in the literature could confuse experts in choosing the best techniques for a given dataset. Furthermore, no clustering technique can tackle all problems, such as cluster structure, noise, or density. To manage large datasets, existing clustering techniques need scalable solutions. Therefore, this paper proposes an ensemble partition-based clustering with a majority voting technique for large dataset partitioning using the aggregation of k-means, k-medoids, fuzzy c-means, expectation-maximization (EM) and density-based spatial clustering of applications with noise (DBSCAN) techniques. These techniques cluster the large dataset individually in the first stage. The final clusters are discovered in the next stage through a majority voting technique among the five clustering algorithms. These five clustering algorithms assigned data instances to the cluster with the most votes. The experimental findings demonstrate that the ensemble partition-based clustering method surpasses the other five clustering algorithms in terms of execution time and accuracy.</span>

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call