Towards Metric DBSCAN: Exact, Approximate, and Streaming Algorithms

Guanlin Mo,Shihong Song,Hu Ding

doi:10.1145/3654981

Abstract

DBSCAN is a popular density-based clustering algorithm that has many different applications in practice. However, the running time of DBSCAN in high-dimensional space or general metric space (\em e.g., clustering a set of texts by using edit distance) can be as large as quadratic in the input size. Moreover, most of existing accelerating techniques for DBSCAN are only available for low-dimensional Euclidean space. In this paper, we study the DBSCAN problem under the assumption that the inliers (the core points and border points) have a low intrinsic dimension (which is a realistic assumption for many high-dimensional applications), where the outliers can locate anywhere in the space without any assumption. First, we propose a k-center clustering based algorithm that can reduce the time-consuming labeling and merging tasks of DBSCAN to be linear. Further, we propose a linear time approximate DBSCAN algorithm, where the key idea is building a novel small-size summary for the core points. Also, our algorithm can be efficiently implemented for streaming data and the required memory is independent of the input size. Finally, we conduct our experiments and compare our algorithms with several popular DBSCAN algorithms. The experimental results suggest that our proposed approach can significantly reduce the computational complexity in practice.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards Metric DBSCAN: Exact, Approximate, and Streaming Algorithms

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Management of Data

Lead the way for us

Similar Papers

2DBSCAN with Local Outlier Detection
Urja Pandya ... Anjana Rathwa
-
Urja Pandya, et. al.Urja Pandya ... Anjana Rathwa
01 Jan 2020
01 Jan 2020

Non-iterative border-peeling clustering algorithm based on swap strategy
Hui Tu ... Ling Ding
Information Sciences | VOL. 654
Hui Tu, et. al.Hui Tu ... Ling Ding
04 Nov 2023
Information Sciences | VOL. 654

A contextual normalised edit distance
Colin De La Higuera ... Luisa Mico
-
Colin De La Higuera, et. al.Colin De La Higuera ... Luisa Mico
01 Apr 2008
01 Apr 2008

A Contextual Normalised Edit Distance
Colin De La Higuera ... Luisa Micó
-
Colin De La Higuera, et. al.Colin De La Higuera ... Luisa Micó
01 Apr 2008
01 Apr 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards Metric DBSCAN: Exact, Approximate, and Streaming Algorithms

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Management of Data