Faster DBSCAN and HDBSCAN in Low-Dimensional Euclidean Spaces

Mark De Berg,Ade Gunawan,Marcel Roeloffzen

doi:10.1142/s0218195919400028

Abstract

We present a new algorithm for the widely used density-based clustering method dbscan. For a set of [Formula: see text] points in [Formula: see text] our algorithm computes the dbscan-clustering in [Formula: see text] time, irrespective of the scale parameter [Formula: see text] (and assuming the second parameter MinPts is set to a fixed constant, as is the case in practice). Experiments show that the new algorithm is not only fast in theory, but that a slightly simplified version is competitive in practice and much less sensitive to the choice of [Formula: see text] than the original dbscan algorithm. We also present an [Formula: see text] randomized algorithm for hdbscan in the plane — hdbscan is a hierarchical version of dbscan introduced recently — and we show how to compute an approximate version of hdbscan in near-linear time in any fixed dimension.

Highlights

Clustering is one of the most fundamental tasks in data mining
One of the most popular density-based clustering methods is dbscan; the paper by Ester et al [12] on dbscan has been cited over 8,800 times, and in 2014 dbscan received the test-of-time award from KDD, a leading data-mining conference. dbscan has two parameters, ε and MinPts, that together determine when the density around a point p ∈ D is high enough for p to be part of a cluster as opposed to being noise; see Section 2 for a precise definition of the dbscan clustering
We present an O(n log n) algorithm to compute the dbscan clustering for a set D of n points in the plane, irrespective of the setting of the parameter ε used to define the dbscan clustering

Summary

Introduction

Clustering is one of the most fundamental tasks in data mining. Due to the wide variety of applications where clustering is important, the clustering problem comes in many variants. Gan and Tao [13] published a paper in which they extend the work from the master’s thesis to Rd, resulting in an algorithm for dbscan with a running time of. Gan and Tao show that a δ-approximate dbscan clustering can be computed in O(n/δd−1) expected time, using a modified version of the exact algorithm. Their approximate clustering is the same as the approximate clustering defined by Chen et al [9], who already showed how to compute it in O(n log n + n/δd−1). The original dbscan algorithm construct clusters one by one and assigns a border point p to the first cluster that finds p; we assign border points to the cluster of their nearest core point

A fast algorithm for DBScan

A fast algorithm for HDBScan in the plane

Approximate HDBScan

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Computational Geometry & Applications	Publication Date: Mar 1, 2019
Citations: 9	License type: cc-by

R Discovery Prime

R Discovery Prime

Faster DBSCAN and HDBSCAN in Low-Dimensional Euclidean Spaces

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Computational Geometry & Applications

Lead the way for us

Similar Papers

Faster DBScan and HDBScan in Low-Dimensional Euclidean Spaces
...
-
, et. al. ...
01 Dec 2017
01 Dec 2017

Unsupervised Learning of Shape Manifolds
N Rajpoot ... A H Bhalerao
-
N Rajpoot, et. al.N Rajpoot ... A H Bhalerao
01 Jan 2007
01 Jan 2007

Ultra-low-dimensional embeddings for doubling metrics
T.-H Hubert Chan ... Kunal Talwar
Journal of the ACM | VOL. 57
T.-H Hubert Chan, et. al.T.-H Hubert Chan ... Kunal Talwar
01 Apr 2010
Journal of the ACM | VOL. 57

Ultra-low-dimensional embeddings for doubling metrics
...
-
, et. al. ...
01 Jan 2008
01 Jan 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Faster DBSCAN and HDBSCAN in Low-Dimensional Euclidean Spaces

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Computational Geometry &amp; Applications

More From: International Journal of Computational Geometry & Applications