Abstract

Cluster analysis is often regarded as a pretreatment step in many data mining applications. It is an unsupervised learning method that tries to find some distributions and patterns in unlabeled data sets. Clustering algorithms have been studied for decades, but we think that none of them is all purpose. This paper presents three fast synchronization clustering algorithms based on spatial index structures, which are three improved versions of SynC algorithm in time complexity. In order to clearly describe these algorithms, it lists or introduces several concepts. After the theoretic analysis of the three FSynC algorithms, we find the time complexity can be decreased by using an R-tree index structure or by combining a multi-dimensional grid cell partitioning method with a Red-Black tree structure to construct the near neighbor point sets of all points. In the simulations, SynC algorithm is used as the comparative algorithm of three FSynC algorithms. From the simulated experiments of some artificial data sets, seven UCI data sets, and three picture data sets, we observe that the three FSynC algorithms need less time cost than SynC algorithm in many kinds of data sets. So we can say our three fast synchronization clustering algorithms can replace SynC algorithm with less time cost in many kinds of data sets. At last, it gives several solid and insightful future research suggestions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call