Abstract
Although co-association (CA) matrix-based spectral ensemble clustering (SEC) has achieved significant success, it faces two persistent challenges: (1) CA matrix-based SEC may result in imbalanced cluster outcomes, and (2) SEC encounters difficulties due to the high time and space complexity resulting from the quadratic scaling of the CA matrix with the number of data points. To tackle these issues, we propose a doubly stochastic co-association (DSCA) matrix construction method. We have theoretically demonstrated that DSCA matrix-based SEC can achieve a more balanced cluster outcome compared to CA matrix-based SEC. Leveraging the DSCA matrix, we introduce two novel methods: Fast SEC (FSEC) and Approximate SEC (ASEC). FSEC efficiently solves the normalized cut on the DSCA matrix as it is symmetric and doubly stochastic. In ASEC, we employ random samples rather than the entire big dataset for ensemble clustering. After obtaining the ensemble clustering result of benchmark random sample, the approximate ensemble clustering result for all data points is derived using our newly proposed probability nearest neighbors (PNN) algorithm. Experimental results on both real-world and synthetic datasets are evaluated in terms of accuracy (ACC), normalized mutual information (NMI), purity, F1-score, and adjusted rand index (ARI) and confirm the scalability and effectiveness of the proposed FSEC and ASEC.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.