Abstract

A clustering ensemble aims to combine multiple clustering models to produce a better result than that of the individual clustering algorithms in terms of consistency and quality. In this paper, we propose a clustering ensemble algorithm with a novel consensus function named Adaptive Clustering Ensemble. It employs two similarity measures, cluster similarity and a newly defined membership similarity, and works adaptively through three stages. The first stage is to transform the initial clusters into a binary representation, and the second is to aggregate the initial clusters that are most similar based on the cluster similarity measure between clusters. This iterates itself adaptively until the intended candidate clusters are produced. The third stage is to further refine the clusters by dealing with uncertain objects to produce an improved final clustering result with the desired number of clusters. Our proposed method is tested on various real-world benchmark datasets and its performance is compared with other state-of-the-art clustering ensemble methods, including the Co-association method and the Meta-Clustering Algorithm. The experimental results indicate that on average our method is more accurate and more efficient.

Highlights

  • In the context of machine learning, an ensemble is generally defined as “a machine learning system that is constructed with a set of individual models working in parallel, whose outputs are combined with a decision fusion strategy to produce a single answer for a given problem” [44]

  • The only difference is that on average the Adaptive Clustering Ensemble (ACE) achieved the best performance, along with the Dual-Similarity Clustering Ensemble method (DSCE) algorithm, measured by Normalised Mutual Information (NMI). Under this experimental set-up, i.e. with a fixed value for k for each dataset, ACE does not show a superiority to its predecessor DSCE, it does in comparison to the other methods

  • That is why we extended DSCE to ACE to cope with variable numbers of clusters generated by the members

Read more

Summary

Introduction

In the context of machine learning, an ensemble is generally defined as “a machine learning system that is constructed with a set of individual models working in parallel, whose outputs are combined with a decision fusion strategy to produce a single answer for a given problem” [44]. In unsupervised learning, there is normally no prior knowledge about the underlying structure or about any particular properties that we want to find or about what we consider good solutions for the data [23, 38]. Different clustering algorithms may produce different clustering results for the same data by imposing a particular structure onto the data. There is no single clustering algorithm that can perform consistently well for different problems and there are no clear guidelines to follow for choosing individual clustering algorithms for a given problem

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.