Abstract
Though there exist a lot of cluster ensemble approaches, few of them consider how to deal with noisy datasets. In this paper, we design a new noise immunization based cluster ensemble framework named as AP2CE to tackle the challenges raised by noisy datasets. AP2CE not only takes advantage of the affinity propagation algorithm (AP) and the normalized cut algorithm (Ncut), but also possesses the characteristics of cluster ensemble. Compared with traditional cluster ensemble approaches, AP2CE is characterized by several properties. (1) It adopts multiple distance functions instead of a single Euclidean distance function to avoid the noise related to the distance function. (2) AP2CE applies AP to prune noisy attributes and generate a set of new datasets in the subspaces consists of representative attributes obtained by AP. (3) It avoids the explicit specification of the number of clusters. (4) AP2CE adopts the normalized cut algorithm as the consensus function to partition the consensus matrix and obtain the final result. The experiments on real datasets show that (1) AP2CE works well on most of real datasets, in particular the noisy datasets; (2) AP2CE is a better choice for most of real datasets when compared with other cluster ensemble approaches; (3) AP2CE has the capability to provide more accurate, stable and robust results.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.