Density Peak Clustering Algorithm Considering Topological Features

Shuyi Lu,Rong Luo,Weikuan Jia,Chengjiang Li,Jian Lian,Yuanjie Zheng

doi:10.3390/electronics9030459

Abstract

The clustering algorithm plays an important role in data mining and image processing. The breakthrough of algorithm precision and method directly affects the direction and progress of the following research. At present, types of clustering algorithms are mainly divided into hierarchical, density-based, grid-based and model-based ones. This paper mainly studies the Clustering by Fast Search and Find of Density Peaks (CFSFDP) algorithm, which is a new clustering method based on density. The algorithm has the characteristics of no iterative process, few parameters and high precision. However, we found that the clustering algorithm did not consider the original topological characteristics of the data. We also found that the clustering data is similar to the social network nodes mentioned in DeepWalk, which satisfied power-law distribution. In this study, we tried to consider the topological characteristics of the graph in the clustering algorithm. Based on previous studies, we propose a clustering algorithm that adds the topological characteristics of original data on the basis of the CFSFDP algorithm. Our experimental results show that the clustering algorithm with topological features significantly improves the clustering effect and proves that the addition of topological features is effective and feasible.

Highlights

With the advent of the era of big data, information grows rapidly [1,2]
In order to solve the problem that the CFSFDP algorithm does not consider the topological characteristics of data, we innovatively proposed a new idea, using DeepWalk algorithm to represent the potential information of data, and applying this representation as topological characteristics to data clustering to improve the accuracy of data clustering
By summarizing and thinking about the classical clustering algorithm, we find that the classical clustering algorithm does thinking about the classical clustering algorithm, we find that the classical clustering algorithm does not consider the topological characteristics of the data set

Summary

Introduction

The influx of massive data makes the statistics and screening of important information more difficult. Cluster analysis is an important statistical analysis method, which is mainly used to solve classification problems. It is a technique for integrating similar information into meaningful subclasses of data and trying to find patterns embedded in the underlying structure of massive data [3,4,5]. Due to its wide applicability, many clustering algorithms have been invented, including K-means, the Affinity Propagation (AP) algorithm, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Ordering Points to Identify the Clustering Structure (OPTICS) and the Clustering by Fast Cluster analysis has been widely used in computer vision, database knowledge discovery, image processing and other fields [6,7,8,9].

Objectives

Methods

Discussion

Conclusion