Local gap density for clustering high-dimensional data with varying densities

Ruijia Li,Xiaofei Yang,Xiaolong Qin,William Zhu

doi:10.1016/j.knosys.2019.104905

Abstract

Density-based clustering algorithms are for clustering the data with arbitrary shapes. However, most of these algorithms face difficulties in handling the high-dimensional data with varying densities; especially, they cannot well discover the clusters in sparse regions. In this paper, we define a new type of density, local gap density, in the k-NN graph which works well for high-dimensional data. The local gap density of each point considers not only the number of all points in its nearest neighbor but also the average distance from this point to all points in this nearest neighbor. In this way, the core points in sparse regions in the sense of existing density-based clustering have high densities in our density definition, so they can be easily detected. By the core points, the potential cross-cluster edges in the k-NN graph can be well identified. After deleting these edges, we group all the points in each component with large cardinality as a subcluster, and then, similar to density peaks clustering, assign each remaining point to its corresponding existing subcluster. Extensive experiments on eight publicly available datasets demonstrate the effectiveness of our clustering algorithm.

Full Text