Abstract

To improve the efficiency of finding arbitrary shape clusters from large data sets and overcome the adverse effect on the clustering accuracy from noise data, an arbitrary shaped clustering algorithm based on density-accumulated is proposed. Using the idea of particle coagulation, the algorithm firstly generates a small scale subset only by one scanning large original data set, in which each of data points is given a weight value. Second, noise data are removed from the weighted subset in terms of the weight distribution of data points so that the clear structures and shapes of clusters are obtained. Finally, the arbitrary shape clusters are found from the weighted subset using the existing clustering algorithms such as a hierarchical, a density-based or a spectral clustering algorithm, and then the cluster structures of original data set are represented by those of the weighted subset. The experimental results show that the novel method has high clustering efficiency and accuracy, and can effectively suppress noise in data set.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call