Abstract

This paper addresses the problem of clustering ensemble which aims to combine multiple clusterings into a probably better solution in terms of robustness, novelty and stability. The proposed Iterative Combining Clusterings Method (ICCM) processes iteratively the entire dataset, where each iteration is based on two steps framework. In the first step, different clustering algorithms process the common dataset individually and, in the next step, a set of sub-clusters is extracted through a voting process among the data objects. To overcome the ambiguity due to voting, only objects with majority voting are assigned to their correspondent sub-clusters. The remaining objects are then collected and re-clustered in the next iterations. At the end of the iterative process, a clustering algorithm is used to group the obtained sub-cluster centres and extract the final clusters of the dataset. Two gene expression datasets and three real-life datasets have been used to evaluate the proposed approach using external and internal criteria. The experimental results demonstrate the effectiveness and robustness of the proposed method, where an improvement up to 16.89% for iris dataset, and up to 14.98% for wine dataset in DB index has been achieved. The external validity metrics confirm the usefulness of the proposed approach by achieving the highest average NMI (%) score of 81.05%, across the datasets compared to different clustering ensemble methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call