Abstract

Consensus clustering aims to find a single partition of data that agrees as much as possible with existing basic partitions. Given its robustness and generalizability, consensus clustering has emerged as a promising solution to find cluster structures inside heterogeneous big data rising from various application domains. In the area of fuzzy systems, however, research along this line is still in its initial stage with some unsystematic algorithmic studies. Finding a fuzzy consensus partition from multiple fuzzy basic partitions in an efficient, flexible, and robust way is still an exciting open problem calling for further investigation. In light of this, this paper provides a systematic study of fuzzy consensus clustering (FCC) from a utility perspective. Specifically, we first define the objective function of FCC clearly using the novel fuzzified contingency matrix. We then derive a family of FCC Utility functions termed as FCCU that can transform FCC to a weighted piecewise fuzzy $c$ -means clustering (piFCM) problem. This helps us to establish an algorithmic framework for FCC with flexible choice of utility functions, and speeds FCC significantly with a FCM-like iterative process of piFCM. To meet the big data challenge, we further parallelize FCC on the Spark platform with both vertical and horizontal segmentation schemes. Extensive experiments on various real-world datasets demonstrate the excellent performance of FCC, even with a majority of poor basic partitions. In particular, our method exhibits interesting potential for big data clustering in two real-life applications concerned with online event detection and overlapping community detection, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.