Abstract

DBSCAN is one of clustering algorithms which can report arbitrarily-shaped clusters and noises without requiring the number of clusters as a parameter (unlike the other clustering algorithms, k-means, for example). Because the running time of DBSCAN has quadratic order of growth, i.e. O(n2), research studies on improving its performance have been received a considerable amount of attention for decades. Grid-based DBSCAN is a well-developed algorithm whose complexity is improved to O(nlog n) in 2D space, while requiring Ω(n4/3) to solve when dimension ≥ 3. However, we find that Grid-based DBSCAN suffers from two problems: neighbour explosion and redundancies in merging, which make the algorithms infeasible in high dimensional space. In this paper we first propose a novel algorithm called GDCF which utilizes bitmap indexing to support efficient neighbour grid queries. Second, based on the concept of union-find algorithm we devise a forest-like structure, called cluster forest, to alleviate the redundancies in the merging. Moreover, we find that running the cluster forest in different orders can lead to a different number of merging operations needed to perform in the merging step. We propose to perform the merging step in a uniform random order to optimize the number of merging operations. However, for high-density database, a bottleneck could be occurred, we further propose a low-density-first order to alleviate this bottleneck. The experiments resulted on both real-world and synthetic datasets demonstrate that the proposed algorithm outperforms the state-of-the-art exact/approximate DBSCAN and suggests a good scalability.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.