Abstract

Density-Based Spatial Clustering of Applications with Noise (DBSCAN) identifies high-density connected areas as clusters, so that it has advantages in discovering arbitrary-shaped clusters. However, it has difficulty in adjusting parameters and since it needs to scan all data points in turn, its time complexity is O(n2). Granular-ball (GB) is a coarse grained representation of data. It is on the basis of the assumption that an object and its local neighbors have similar distribution and they have high possibility of belonging to the same class. It has been introduced into supervised learning by Xia et al. to improve the efficiency of supervised learning. Inspired by the idea of granular-ball, we introduce it into unsupervised learning and use it to improve the efficiency of DBSCAN, called GB-DBSCAN. The main idea of the proposed algorithm GB-DBSCAN is to employ granular-ball to represent a set of data points and then clustering on granular-balls, instead of the data points. Firstly, we use k-nearest neighbors (KNN) to generate granular-balls, which is a bottom-up strategy, and describe granular-balls according to their centers and radius. Then, the granular-balls are divided into Core-GBs and Non-Core-GBs according to their density. After that, the Core-GBs are merged into clusters according to the idea of DBSCAN and the Non-Core-GBs are assigned to the appropriate clusters. Since the granular-balls' number is much smaller than the size of the objects in a dataset, the running time of DBSCAN is greatly reduced. By comparing with KNN-BLOCK DBSCAN, RNN-DBSCAN, DBSCAN, K-means, DP and SNN-DPC algorithms, the proposed algorithm can get similar or even better clustering result in much less running time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call