A Multi-Density Clustering Algorithm Based on Similarity for Dataset With Density Variation

Xingxing Zhou,Genlin Ji,Haiping Zhang,Guoan Tang

doi:10.1109/access.2019.2960159

Xingxing Zhou, Genlin Ji + Show 2 more

Open Access

https://doi.org/10.1109/access.2019.2960159

Copy DOI

Abstract

Clustering has been widely used in the fields of knowledge discovery, pattern recognition and artificial intelligence. However, discovering clusters in spatial databases is still a challenging task, especially when the shape, size, and density of clusters vary a lot. Existing algorithms have sensitive parameters, clusters must be separated far enough from each other and rich prior knowledge about datasets is required. In this paper, we propose algorithm DENSS, which performs clustering on the basis of the similarity of neighbour distribution and the number of shared neighbors for two objects. Algorithm DENSS can mine clusters that differ in densities, and within a cluster the local densities are reasonably homogeneous. Adjacent objects are separated into different clusters by significant change in densities. To verify the effectiveness of the algorithm DENSS, synthetic and real-world datasets are used for testing, and it has been compared with seven clustering algorithms. Experimental results show that the proposed algorithm has a relatively high efficiency, robustness and effectiveness, and is remarkably superior to the seven algorithms. This algorithm is universal and can rapidly and efficiently identify the clusters of different densities, shapes and sizes even in the presence of noise and outliers for any object feature types.

Highlights

Clustering aims to divide objects in the datasets with different properties into their corresponding categories and has been widely used in knowledge discovery [1], pattern recognition [2], [3] and artificial intelligence [4], [5]
To solve the above problems, we propose a new clustering algorithm that identifies similar objects as a cluster based on the similarity between objects
Given that this paper focuses on multi-density clustering, two synthetic datasets that contain multiple clusters of different densities were designed to fully demonstrate the performance of DENSS

Summary

INTRODUCTION

Clustering aims to divide objects in the datasets with different properties into their corresponding categories and has been widely used in knowledge discovery [1], pattern recognition [2], [3] and artificial intelligence [4], [5]. To solve the above problems, we propose a new clustering algorithm that identifies similar objects as a cluster based on the similarity between objects. This similarity mainly includes two aspects: the similarity of neighbor distribution and the similarity of shared neighbors for the two objects. (1) A new multi-density clustering algorithm is proposed This algorithm can identify clusters of any size, any shape and any density, and it is robust against noise. Even clusters without distinct division zones can be identified perfectly In this algorithm, the parameters are simple to set, and insensitive, and not require users to have a rich prior knowledge.

RELATED WORKS

EXPERIMENTS AND ANALYSIS

Findings

CONCLUSION