Abstract

Clustering ensemble technique has been shown to be effective in improving the accuracy and stability of single clustering algorithms. With the development of information technology, the amount of data, such as image, text and video, has increased rapidly. Efficiently clustering these large-scale datasets is a challenge. Clustering ensembles usually transform clustering results to a co-association matrix, and then to a graph-partition problem. These methods may suffer from information loss when computing the similarity among samples or base clusterings. Rich information between samples and base clusterings is ignored. Moreover, the results are not discrete. They need post-processing steps to obtain the final clustering result, which will deviate greatly from the real clustering result. To address this problem, we propose a co-clustering ensemble based on bilateral k-means (CEBKM) algorithm. Our algorithm can simultaneously cluster samples and base clusterings of a dataset, to fully exploit the potential information between the samples and the base clusterings. In addition, it can directly obtain the final clustering results without using other clustering algorithms. The proposed method, outperformed several state-of-the-art clustering ensemble methods in experiments conducted on real-world and toy datasets.

Highlights

  • Clustering technique is applied in various fields, such as biology [1], image retrieval [2], information retrieval [3], and image processing [4]

  • Graph-based methods usually transform clustering results into a co-association. This is very important in the clustering ensemble method, which determines how well to conduct graph partitioning for the final consensus partition

  • Unlike the traditional ensemble methods that only work on the sample or base clusterings of the datasets

Read more

Summary

INTRODUCTION

Clustering technique is applied in various fields, such as biology [1], image retrieval [2], information retrieval [3], and image processing [4]. In the consensus function step, multiple base clusterings are combined into a matrix to improve the accuracy of the final clustering result [22]–[26]. Graph-based methods usually transform clustering results into a co-association This is very important in the clustering ensemble method, which determines how well to conduct graph partitioning for the final consensus partition. It only computes the similarity between data points, ignoring rich information between samples and base clusterings. Many existing clustering ensemble methods require post-processing steps to obtain the result Some, such as CPSA, HGPA, MCLA, and HGPA use a k-means or METIS algorithm to obtain the final clustering result.

RELATED WORK
HYBRID BIPARTITE GRAPH FORMULATION
CLUSTERING ENSEMBLE
EXPERIMENTS
1) EVALUATION CRITERIA
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.