Abstract

Clustering ensemble is an effective way of improving the quality of clustering results. However, designing ensembles is a very difficult task because many factors that influence the performance of the ensemble should be considered: the types of clustering algorithms, the parameters of the algorithms (e.g., initialization method, initial seed values), ensemble size, and use of different samples and/or features of the dataset. In this study, eight different clustering ensembles are designed using several clustering algorithms (k-means, expectation maximization, hierarchical, canopy, and farthest first) and compared to each other in terms of accuracy to assess the impact of these factors. Traditionally, all clustering results produced by all ensemble components are used to create the final consensus clustering result. Unfortunately, some clustering solutions are not as good as others and decrease the overall performance. To solve this problem, this paper proposes an accuracy-based solution selection strategy. In the experimental studies, different clustering ensembles by the proposed solution selection strategy were applied on 14 well-known datasets to determine the optimal ensemble design. According to the experimental results, clustering ensemble strategies significantly outperform single clustering models by better discovering the latent patterns in data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call