A Novel Local Density Hierarchical Clustering Algorithm Based on Reverse Nearest Neighbors

Yaohui Liu,Dong Liu,Fang Yu,Zhengming Ma

doi:10.1155/2019/2959017

Yaohui Liu, Dong Liu + Show 2 more

Open Access

https://doi.org/10.1155/2019/2959017

Copy DOI

Abstract

Clustering is widely used in data analysis, and density-based methods are developed rapidly in the recent 10 years. Although the state-of-art density peak clustering algorithms are efficient and can detect arbitrary shape clusters, they are nonsphere type of centroid-based methods essentially. In this paper, a novel local density hierarchical clustering algorithm based on reverse nearest neighbors, RNN-LDH, is proposed. By constructing and using a reverse nearest neighbor graph, the extended core regions are found out as initial clusters. Then, a new local density metric is defined to calculate the density of each object; meanwhile, the density hierarchical relationships among the objects are built according to their densities and neighbor relations. Finally, each unclustered object is classified to one of the initial clusters or noise. Results of experiments on synthetic and real data sets show that RNN-LDH outperforms the current clustering methods based on density peak or reverse nearest neighbors.

Highlights

Clustering is the task to find a set of groups in which similar objects are in the same group, but different objects are separated into different groups
To evaluate the performance of RNN-LDH, we perform a set of experiments on synthetic and real world data sets which are commonly used to test the performance of clustering algorithms
We compare the performance of RNNLDH with well-known clustering algorithms including RNN-DSC in [15], IS-DSC in [16], ISB-DSC in [6], and ADPC in [5]. ree popular criteria F1 measure (F1) [19], adjusted mutual information (AMI), and adjusted rand index (ARI) [20] are used to evaluate the performance of the above clustering algorithms. e upper bounds of these criteria are all 1.0. e better the clustering is, the larger the benchmark values are

Summary

Introduction

Clustering is the task to find a set of groups in which similar objects are in the same group, but different objects are separated into different groups. FKNN-DPC [8] defines a uniform local density metric based on the k-nearest neighbors and uses a fuzzy technique to complete the assignment procedure after the cluster centres have been found out manually. RECOME [10] defines a new density measure as the ratio of each object’s density to the maximum density of its k-nearest neighbors and uses the divide-andconquer strategy to partition a data set These algorithms have improved DPC in some aspects, they still suffer from some drawbacks of centroid-based methods. PIDC [18] uses the size of the unique closest neighbor set as an estimate of object density and growing strategies to complete clustering This method is parameter independent, it is sensitive to noise and has high computing complexity.

RNN-DHR Algorithm

Procedures of the RNN-DHR Algorithm

Results and Discussion

Conclusions