Abstract

Random projection is a popular method for dimensionality reduction due to its simplicity and efficiency. In the past few years, random projection and fuzzy c-means based cluster ensemble approaches have been developed for high-dimensional data clustering. However, they require large amounts of space for storing a big affinity matrix, and incur large computation time while clustering in this affinity matrix. In this paper, we propose a new random projection, fuzzy c-means based cluster ensemble framework for high-dimensional data. Our framework uses cumulative agreement to aggregate fuzzy partitions. Fuzzy partitions of random projections are ranked using external and internal cluster validity indices. The best partition in the ranked queue is the core (or base) partition. Remaining partitions then provide cumulative inputs to the core, thus, arriving at a consensus best overall partition built from the ensemble. Experimental results with Gaussian mixture datasets and a variety of real datasets demonstrate that our approach outperforms three state-of-the-art methods in terms of accuracy and space-time complexity. Our algorithm runs one to two orders of magnitude faster than other state-of-the-arts algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call