Abstract
The density-based spatial clustering of applications with noise (DBSCAN) is regarded as a pioneering algorithm of the density-based clustering technique. It provides the ability to handle outlier objects, detect clusters of different shapes, and disregard the need for prior knowledge about existing clusters in a dataset. These features along with its simplistic approach helped it become widely applicable in many areas of science. However, for all its accolades, the DBSCAN still has limitations in terms of performance, its ability to detect clusters of varying densities, and its dependence on user input parameters. Multiple DBSCAN-inspired algorithms have been subsequently proposed to alleviate these and more problems of the algorithm. In this paper, the implementation, features, strengths, and drawbacks of the DBSCAN are thoroughly examined. The successive algorithms proposed to provide improvement on the original DBSCAN are classified based on their motivations and are discussed. Experimental tests were conducted to understand and compare the changes presented by a C++ implementation of these algorithms along with the original DBSCAN algorithm. Finally, the analytical evaluation is presented based on the results found.
Highlights
Grouping of data objects is a necessary task in a wide range of studies such as medical diagnosis, civil engineering [2], academics [3], biology [4], [5], and networking
This paper examines the definitions introduced by the density-based clustering method and how the original density-based spatial clustering of applications with noise (DBSCAN) procedure works, identifying the challenges DBSCAN faces and discussing the subsequent density based clustering algorithms inspired by it
Several density-based algorithms have been proposed that aim to improve on the original DBSCAN method
Summary
Grouping of data objects is a necessary task in a wide range of studies such as medical diagnosis (clustering algorithms in identifying cancerous data [1]), civil engineering [2], academics [3], biology [4], [5], and networking (clustering algorithms in wireless sensor network-based applications [6], [7]). This process can be a supervised or unsupervised classification based on the information provided by a dataset. The process of grouping a given dataset into classes in unsupervised classification is called “clustering” [13]–[17]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have