Adaptive Density Peaks Clustering Based on K-Nearest Neighbor and Gini Coefficient

Dong Jiang,Rui Sun,Zehua Wang,Xiyu Liu,Wenke Zang

doi:10.1109/access.2020.3003057

Abstract

Density Peaks Clustering (DPC) is a density-based clustering algorithm that has the advantage of not requiring clustering parameters and detecting non-spherical clusters. The density peaks algorithm obtains the actual cluster center by inputting the cutoff distance and manually selecting the cluster center. Thus, the clustering center point is not selected on the basis of considering the whole data set. This paper proposes a method called G-KNN-DPC to calculate the cutoff distance based on the Gini coefficient and K-nearest neighbor. G-KNN-DPC first finds the optimal cutoff distance with Gini coefficient, and then the center point with the K-nearest neighbor. The automatic clustering center method can not only avoid the error that a cluster detects two center points but also effectively solve the traditional DPC algorithm defect that cannot handle complex data sets. Compared with DPC, Fuzzy C-Means, K-means, KDPC and DBSCAN, the proposed algorithm creates better clusters on different data sets.

Highlights

Big data has been rapidly and widely used in the fields of physics, biological engineering, life medicine etc [1]
Based on many improvements for the density peaks clustering algorithm [18]–[26] and outliers detection strategy [27], [28], we propose a method to calculate the cutoff distance based on the Gini coefficient and find center points by K-nearest neighbor (KNN)
The results demonstrate that G-KNN-Density Peaks Clustering (DPC) can consider the true distribution of a data set and has better performance

Summary

INTRODUCTION

Big data has been rapidly and widely used in the fields of physics, biological engineering, life medicine etc [1]. K-nearest neighbor (KNN) is a classification algorithm, which is simple and efficient It can deal with text and stream data classification problems [14], and shows very well in clustering and strong skill, so this method is constantly introduced into the DPC algorithm. The algorithm uses the KNN idea to estimate the density of each point and uses principal component analysis to reduce the dimensionality of the data, improving the processing ability of high-dimensional data and obtaining a good clustering effect [16]. Based on many improvements for the density peaks clustering algorithm [18]–[26] and outliers detection strategy [27], [28], we propose a method to calculate the cutoff distance based on the Gini coefficient and find center points by KNN. We derive the conclusions given in the last section along with the expected future works

RELATED WORK

CALCULATE THE CUTOFF DISTANCE BASED ON GINI COEFFICIENT

FIND THE CENTER POINTS BY USING K-NEAREST NEIGHBOR

EXPERIMENTS AND ANALYSIS

DECISION GRAPHS COMPARATIVE ANALYSIS

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 20	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Adaptive Density Peaks Clustering Based on K-Nearest Neighbor and Gini Coefficient

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Density peaks clustering based on k-nearest neighbors and self-recommendation
Lin Sun ... Weiping Ding
International Journal of Machine Learning and Cybernetics | VOL. 12
Lin Sun, et. al.Lin Sun ... Weiping Ding
15 Mar 2021
International Journal of Machine Learning and Cybernetics | VOL. 12

A New Density Peak Clustering Algorithm for Automatically Determining Clustering Centers
Zhechuan Wang ... Yuping Wang
-
Zhechuan Wang, et. al.Zhechuan Wang ... Yuping Wang
01 Jun 2020
01 Jun 2020

An Improved Integrated Clustering Learning Strategy Based on Three-Stage Affinity Propagation Algorithm with Density Peak Optimization Theory
Limin Wang ... Abd E.I.-Baset Hassanien
Complexity | VOL. 2021
Limin Wang, et. al.Limin Wang ... Abd E.I.-Baset Hassanien
07 Jan 2021
Complexity | VOL. 2021

Density Peaks Clustering Based on Weighted Local Density Sequence and Nearest Neighbor Assignment
Donghua Yu ... Shuang Yao
IEEE Access | VOL. 7
Donghua Yu, et. al.Donghua Yu ... Shuang Yao
01 Jan 2019
IEEE Access | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive Density Peaks Clustering Based on K-Nearest Neighbor and Gini Coefficient

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access