An Improved Numerical DBSCAN Algorithm Based on Non-IIDness Learning

Yanmei Wang,He Jiang,Chengqiang Wang,Huixin Zhou

doi:10.1109/access.2021.3081500

Abstract

In clustering algorithm research, objects, attributes and other aspects of data sets are usually considered to be independent and identically distributed; that is, each object is assumed to be an independent and uniformly distributed individual with no impacts between objects. However, objects in real life are often neither independently nor identically distributed; that is, they are non-IID, leading to a complex coupling relationship between objects, and objects interact with each other. The results of a clustering algorithm under an independent and identical distribution may be incomplete or even misleading. To make the results of the DBSCAN algorithm as accurate as possible, an improved numerical DBSCAN algorithm based on non-IIDness learning is proposed in this paper. The algorithm calculates the coupling relationship between objects to obtain the potential relationship between objects and determines the parameters Eps and MinPts by the distribution characteristics of the data. Experiments on large-scale real and synthetic data sets show that the algorithm achieves a higher accuracy than the original DBSCAN algorithm and the main algorithms that improved upon it.

Highlights

Clustering refers to the grouping of abstract or physical objects in accordance with the principle that objects in groups should be as similar to each other as possible and that objects in different groups be as different as possible under the condition that samples are not marked; the ultimate purpose of clustering is to discover the natural structure of data [1]
For numerical data, we propose the NDBSCAN algorithm, which is based on the improvement of the DBSCAN algorithm in the case of a nonindependent identical distributions. this algorithm focuses on the non-IID characteristics of each data in the data set and uses the principle of coupling similarity to quantify the relationship between the data
PROPOSED ALGORITHM To cluster numerical data effectively, we propose an improved DBSCAN algorithm based on the idea of a non-independent and identical distribution, namely, the NDBSCAN algorithm

Summary

INTRODUCTION

Clustering refers to the grouping of abstract or physical objects in accordance with the principle that objects in groups should be as similar to each other as possible and that objects in different groups be as different as possible under the condition that samples are not marked; the ultimate purpose of clustering is to discover the natural structure of data [1]. Y. Wang et al.: Improved Numerical DBSCAN Algorithm Based on Non-IIDness Learning adjacent dense grids to form clusters. AA-DBSCAN algorithm [27] uses a new tree structure based on a quadtree to define the data set density layer. This method allows AFM to find clusters of different densities more accurately. In addition to the non-IIDDBSCAN algorithm, other improvements to the accuracy of the DBSCAN algorithm are based on independent identical distributions. This assumption ignores the internal relation between data points.

REASONS TO IMPROVE THE COUPLING PROCESS IN DBSCAN

THE OPTIMIZED COUPLING ATTRIBUTE ANALYSIS OF NUMERICAL DATA

SELECTING THE EPS PARAMETER

SELECTING THE MinPts PARAMETER

EXPERIMENTAL RESULTS AND ANALYSIS

3: Calculate the quadratic power of each attribute

COMPLEXITY ANALYSIS

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Improved Numerical DBSCAN Algorithm Based on Non-IIDness Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Approaches for scaling DBSCAN algorithm to large spatial databases
Aoying Zhou ... Yunfa Hu
Journal of Computer Science and Technology | VOL. 15
Aoying Zhou, et. al.Aoying Zhou ... Yunfa Hu
01 Nov 2000
Journal of Computer Science and Technology | VOL. 15

An Effective Algorithm Based on Density Clustering Framework
Jianyun Lu ... Qingsheng Zhu
IEEE access : practical innovations, open solutions | VOL. 5
Jianyun Lu, et. al.Jianyun Lu ... Qingsheng Zhu
01 Jan 2017
IEEE access : practical innovations, open solutions | VOL. 5

Merging DBSCAN and Density Peak for Robust Clustering
Jian Hou ... Chengcong Lv
-
Jian Hou, et. al.Jian Hou ... Chengcong Lv
01 Jan 2019
01 Jan 2019

DBSCAN Clustering Algorithm Based on Density
Dingsheng Deng
-
Dingsheng DengDingsheng Deng
01 Sep 2020
01 Sep 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Improved Numerical DBSCAN Algorithm Based on Non-IIDness Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions