Abstract

The synchronization-inspired clustering algorithm (Sync) is a novel and outstanding clustering algorithm, which can accurately cluster datasets with any shape, density and distribution. However, the high-dimensional dataset with high dimensionality, high noise, and high redundancy brings some new challenges for the synchronization-inspired clustering algorithm, resulting in a significant increase in clustering time and a decrease in clustering accuracy. To address these challenges, an enhanced synchronization-inspired clustering algorithm, namely SyncHigh, is developed in this paper to quickly and accurately cluster the high-dimensional datasets. First, a PCA-based (Principal Component Analysis) dimension purification strategy is designed to find the principal components in all attributes. Second, a density-based data merge strategy is constructed to reduce the number of objects participating in the synchronization-inspired clustering algorithm, thereby speeding up clustering time. Third, the Kuramoto Model is enhanced to deal with mass differences between objects caused by the density-based data merge strategy. Finally, extensive experimental results on synthetic and real-world datasets show the effectiveness and efficiency of our SyncHigh algorithm.

Highlights

  • Clustering uses an unsupervised way to uncover the hidden rules and patterns of human society; it is an indispensable mean to mine the complex real-world data [1]

  • The first category is subspace clustering, which first sees each dimension as a subspace to perform local clustering, and integrates all local clustering results in different subspaces to obtain the final result based on the local correlation

  • This paper proposes an enhanced synchronization-inspired clustering for high-dimensional data, called SyncHigh

Read more

Summary

Introduction

Clustering uses an unsupervised way to uncover the hidden rules and patterns of human society; it is an indispensable mean to mine the complex real-world data [1]. To complete the high-dimensional data clustering, existing methods are mainly divided into the following two categories. The first category is subspace clustering, which first sees each dimension as a subspace to perform local clustering, and integrates all local clustering results in different subspaces to obtain the final result based on the local correlation. The core of this algorithm is to find the appropriate local correlation between different subspaces. Agrawal et al [4] weighed each dimension to determine the correlation of different subspaces; Chen et al [5] further divided high-dimensional attributes into differ-

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.