Local gap density for clustering high-dimensional data with varying densities

Ruijia Li,Xiaofei Yang,Xiaolong Qin,William Zhu

doi:10.1016/j.knosys.2019.104905

Abstract

Density-based clustering algorithms are for clustering the data with arbitrary shapes. However, most of these algorithms face difficulties in handling the high-dimensional data with varying densities; especially, they cannot well discover the clusters in sparse regions. In this paper, we define a new type of density, local gap density, in the k-NN graph which works well for high-dimensional data. The local gap density of each point considers not only the number of all points in its nearest neighbor but also the average distance from this point to all points in this nearest neighbor. In this way, the core points in sparse regions in the sense of existing density-based clustering have high densities in our density definition, so they can be easily detected. By the core points, the potential cross-cluster edges in the k-NN graph can be well identified. After deleting these edges, we group all the points in each component with large cardinality as a subcluster, and then, similar to density peaks clustering, assign each remaining point to its corresponding existing subcluster. Extensive experiments on eight publicly available datasets demonstrate the effectiveness of our clustering algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Local gap density for clustering high-dimensional data with varying densities

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Journal: Knowledge-Based Systems	Publication Date: Aug 12, 2019
Citations: 25

Similar Papers

An Improved Clustering Algorithm Based on Density Distribution Function
Jianhao Tan ... Jing Zhang
Computer and Information Science | VOL. 3
Jianhao Tan, et. al.Jianhao Tan ... Jing Zhang
12 Jul 2010
Computer and Information Science | VOL. 3

DPC clustering algorithm based on K-nearest neighbors and kernel density estimation
Zhou Yu ... Bai Qi
Scientific Insights and Discoveries Review | VOL. 5
Zhou Yu, et. al.Zhou Yu ... Bai Qi
14 Oct 2024
Scientific Insights and Discoveries Review | VOL. 5

Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data
Levent Ertöz ... Michael Steinbach
-
Levent Ertöz, et. al.Levent Ertöz ... Michael Steinbach
01 May 2003
01 May 2003

SOTXTSTREAM: Density-based self-organizing clustering of text streams.
Avory C Bryant ... Krzysztof J Cios
PLOS ONE | VOL. 12
Avory C Bryant, et. al.Avory C Bryant ... Krzysztof J Cios
07 Jul 2017
PLOS ONE | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Local gap density for clustering high-dimensional data with varying densities

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems