Abstract

Data Clustering is an interesting field of unsupervised learning that has been extensively used and discussed over several research papers and scientific studies. It handles several issues related to data analysis by grouping similar entities into the same set. Up to now, many algorithms were developed for clustering using several techniques including centroids, density and dendrograms approaches. We count nowadays more than 100 diverse algorithms and many enhancements for each algorithm. Therefore, data scientists still struggle to find the best clustering method to use among this diversity of techniques. In this paper we present a survey on DBSCAN algorithm and its enhancements with respect to time requirement. A significant comparison of DBSCAN versions is also illustrated in this paper to help data scientist make decisions about the best version of DBSCAN to use.

Highlights

  • The fast development of the internet and the availability of cheap mobiles, smart sensors and social networks applications allow users to generate a huge amount of data continuously

  • Many algorithms for clustering have been developed to guide data scientists to analyse and to understand data despite its volume. These algorithms play a crucial role in several sophisticated systems and applications including recommender systems, medical applications, face recognition, environmental assessment and anomalies detection [1][2][3][4][5]

  • Knowing that complexity is a powerful criterion to decide about the efficiency of an algorithm, we propose a survey in the rest of this paper, a review of some well-cited DBSCAN extensions which significantly affect the time requirement

Read more

Summary

INTRODUCTION

The fast development of the internet and the availability of cheap mobiles, smart sensors and social networks applications allow users to generate a huge amount of data continuously This rapid increase of data volume makes several domains difficult to be understood using only human capabilities. Many algorithms for clustering have been developed to guide data scientists to analyse and to understand data despite its volume. Algorithms must handle all complicated issues generated by data such as supporting volume increases, improving the scalability, processing high dimensional space, dealing with shaped structure and detecting outliers. DBSCAN [11] improves scalability and deals with heavily skewed data and HDBSCAN [12] solves initial parameters issues, etc.

CLUSTERING TECHNIQUES
Partitional Clustering
Hierarchical Clustering
DBSCAN Algorithm
DBSCAN COMPLEXITY ENHANCEMENTS
A Linear DBSCAN
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call