Abstract

Density-based spatial clustering of applications with noise (DBSCAN) is a fundament algorithm for density-based clustering. It can discover clusters of arbitrary shapes and sizes from a large amount of data, which is containing noise and outliers. However, it fails to treat large datasets, to attend to outperforming when new data objects are inserted into the existing database, to remove totally a noise points and outliers and to handle the local density variation that exists within the cluster. So, a good clustering method should allow a significant density modification within the cluster and should learn a dynamics and large databases. In this paper, an enhancement of the DBSCAN algorithm is proposed based on incremental clustering called AMF-IDBSCAN which builds incrementally the clusters of different shapes and sizes in large datasets and eliminates the presence of noise and outliers. The proposed AMF-IDBSCAN algorithm uses a canopy clustering algorithm to pre-clustering datasets to decrease the volume of data, applies an incremental DBSCAN for clustering the data points and Adaptive Median Filtering (AMF) technique for post-clustering to reduce the number of outliers by replacing noises by chosen medians. Experimental results are obtained from the University California Irvine (UCI) repository UCI data sets. The final results show that our algorithm get good results with respect to the famous DBSCAN, IDBSCAN, and DMDBSCAN

Highlights

  • Data mining is an interdisciplinary topic that can be defined in many different ways [1]

  • We were interested in the incremental version of Density-based spatial clustering of applications with noise (DBSCAN) since (1) it is capable of discovering clusters of random shape; (2) it requires just two parameters and is most inconsiderate to the ordering of the points in the database; (3) it reduces the search space and facilitates an incremental update in the clusters; (4) it is more adaptive to various datasets and data space without some initial information [8] and (5) the DBSCAN with incremental concept saves a lot of time and effort efficiently, whereas static DBSCAN has already suffered from some drawbacks and these problems are mainly faced in dynamic large databases in the existing system [9]; In this paper, we propose an Adaptive Median Filtering (AMF)-IDBSCAN algorithm an enhanced version of the DBSCAN

  • To overcome the limitations of the high complexity and the non scalability of the traditional clustering algorithms, we have developed in this work AMFDBSCAN: An enhanced incremental DBSCAN using a canopy clustering algorithm and an adaptive median filtering technique

Read more

Summary

Introduction

Data mining is an interdisciplinary topic that can be defined in many different ways [1]. The difference between the traditional clustering methods (batch mode) and those of incremental clustering is the ability of the latter to process new data included in the data collection without having to perform a full re-clustering This allows a dynamic following of updates to the database during clustering. We were interested in the incremental version of DBSCAN since (1) it is capable of discovering clusters of random shape; (2) it requires just two parameters and is most inconsiderate to the ordering of the points in the database; (3) it reduces the search space and facilitates an incremental update in the clusters; (4) it is more adaptive to various datasets and data space without some initial information [8] and (5) the DBSCAN with incremental concept saves a lot of time and effort efficiently, whereas static DBSCAN has already suffered from some drawbacks and these problems are mainly faced in dynamic large databases in the existing system [9]; In this paper, we propose an AMF-IDBSCAN algorithm an enhanced version of the DBSCAN.

Related work
The proposed AMF-IDBSCAN clustering algorithm
Pre-clustering
Classical static DBSCAN clustering algorithm
Incremental DBSCAN clustering
Procedure
Post-clustering
Performance evaluation
Run the incremental DBSCAN
Experiments and results
Chefrour et al IDBSCAN
Findings
Conclusion and perspectives
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call