Abstract

We present a new algorithm for the widely used density-based clustering method dbscan. For a set of [Formula: see text] points in [Formula: see text] our algorithm computes the dbscan-clustering in [Formula: see text] time, irrespective of the scale parameter [Formula: see text] (and assuming the second parameter MinPts is set to a fixed constant, as is the case in practice). Experiments show that the new algorithm is not only fast in theory, but that a slightly simplified version is competitive in practice and much less sensitive to the choice of [Formula: see text] than the original dbscan algorithm. We also present an [Formula: see text] randomized algorithm for hdbscan in the plane — hdbscan is a hierarchical version of dbscan introduced recently — and we show how to compute an approximate version of hdbscan in near-linear time in any fixed dimension.

Highlights

  • Clustering is one of the most fundamental tasks in data mining

  • One of the most popular density-based clustering methods is dbscan; the paper by Ester et al [12] on dbscan has been cited over 8,800 times, and in 2014 dbscan received the test-of-time award from KDD, a leading data-mining conference. dbscan has two parameters, ε and MinPts, that together determine when the density around a point p ∈ D is high enough for p to be part of a cluster as opposed to being noise; see Section 2 for a precise definition of the dbscan clustering

  • We present an O(n log n) algorithm to compute the dbscan clustering for a set D of n points in the plane, irrespective of the setting of the parameter ε used to define the dbscan clustering

Read more

Summary

Introduction

Clustering is one of the most fundamental tasks in data mining. Due to the wide variety of applications where clustering is important, the clustering problem comes in many variants. Gan and Tao [13] published a paper in which they extend the work from the master’s thesis to Rd, resulting in an algorithm for dbscan with a running time of. Gan and Tao show that a δ-approximate dbscan clustering can be computed in O(n/δd−1) expected time, using a modified version of the exact algorithm. Their approximate clustering is the same as the approximate clustering defined by Chen et al [9], who already showed how to compute it in O(n log n + n/δd−1). The original dbscan algorithm construct clusters one by one and assigns a border point p to the first cluster that finds p; we assign border points to the cluster of their nearest core point

A fast algorithm for DBScan
A fast algorithm for HDBScan in the plane
Approximate HDBScan
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.