Abstract

Data clustering is a very active research area in machine learning and knowledge discovery. Generating clusters of different densities is a challenging task. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a density-based clustering algorithm that has problems discovering clusters of varied density since it uses a fixed radius and has quadratic time complexity making it difficult in real applications with large datasets. In this paper, a Density Extending Algorithm for Data Clustering (DEADC) is proposed to cluster datasets with different densities, sizes, and noise with better accuracy and less execution time. DEADC uses a dynamic radius variable called ϵ <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Extended</inf> based on statistical analysis that assigns a regional density value for each cluster by extending the data points with the ϵ <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Extended</inf> neighborhood. DEADC implicitly needs to compute the empirical density for each created cluster, leading to linear time complexity. Experimental results showed the effectiveness of the DEADC for identifying clusters with varied densities from synthetic and real-world datasets with a significant improvement in clustering accuracy and execution time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call