Efficient Computation and Visualization of Multiple Density-Based Clustering Hierarchies

Antonio Cavalcante Araujo Neto,Ricardo J G B Campello,Jorg Sander,Mario A Nascimento

doi:10.1109/tkde.2019.2962412

Antonio Cavalcante Araujo Neto, Ricardo J G B Campello + Show 2 more

Open Access

https://doi.org/10.1109/tkde.2019.2962412

Copy DOI

Abstract

HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While a small change in mpts typically leads to a small change in the clustering structure, choosing a “good” mpts value can be challenging: depending on the data distribution, a high or low mpts value may be more appropriate, and certain clusters may reveal themselves at different values. To explore results for a range of mpts values, one has to run HDBSCAN* for each value independently, which can be computationally impractical. In this paper, we propose an approach to efficiently compute all HDBSCAN* hierarchies for a range of mpts values by building upon results from computational geometry to replace HDBSCAN*'s complete graph with a smaller equivalent graph. An experimental evaluation shows that our approach can obtain over one hundred hierarchies for the computational cost equivalent to running HDBSCAN* about twice, which corresponds to a speedup of more than 60 times, compared to running HDBSCAN* independently that many times. We also propose a series of visualizations that allow users to analyze a collection of hierarchies for a range of mpts values, along with case studies that illustrate how these analyses are performed.

Highlights

THE discovery of groups within datasets plays an important role in the exploration and analysis of data
For the EMST, there are known results from computational geometry that relate the EMST to the Delaunay Triangulation (DT), the Gabriel Graph (GG) and the Relative Neighborhood Graph in the following way [32]: EMST RNG GG DT: (3)
In this paper we presented RNG-HDBSCAN*, an efficient strategy for computing multiple density-based clustering hierarchies

Summary

Introduction

THE discovery of groups within datasets plays an important role in the exploration and analysis of data. HDBSCAN*, the current state-of-the-art among those, computes a hierarchy of nested clusters, representing clusters at different density levels It generalizes and improves several aspects of previous algorithms, and allows for a comprehensive framework for cluster analysis, visualization, and unsupervised outlier detection [9]. HDBSCAN* stands out for its ability to detect arbitraryshaped clusters, and noise, as well as for building a hierarchical organization of cluster structures, rather than finding a single flat partitioning of the data It can be considered a practical and theoretical generalization of its predecessors (DBSCAN and OPTICS). These characteristics allow us to devise and prove the correctness of the strategy proposed in this work

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Knowledge and Data Engineering	Publication Date: Jan 10, 2020
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Efficient Computation and Visualization of Multiple Density-Based Clustering Hierarchies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering

Lead the way for us

Similar Papers

Revealing Density-Based Clustering Structure from the Core-Connected Tree of a Network
Jianbin Huang ... Hongbo Deng
IEEE Transactions on Knowledge and Data Engineering | VOL. 25
Jianbin Huang, et. al.Jianbin Huang ... Hongbo Deng
01 Aug 2013
IEEE Transactions on Knowledge and Data Engineering | VOL. 25

K-nearest uphill clustering in the protein structure space
Xuefeng Cui ... Xin Gao
Neurocomputing | VOL. 220
Xuefeng Cui, et. al.Xuefeng Cui ... Xin Gao
24 Aug 2016
Neurocomputing | VOL. 220

Efficient Computation of Multiple Density-Based Clustering Hierarchies
Antonio Cavalcante Araujo Neto ... Joerg Sander
-
Antonio Cavalcante Araujo Neto, et. al.Antonio Cavalcante Araujo Neto ... Joerg Sander
01 Nov 2017
01 Nov 2017

D-NND: A Hierarchical Density Clustering Method via Nearest Neighbor Descent
Teng Qiu ... Chaoyi Li
-
Teng Qiu, et. al.Teng Qiu ... Chaoyi Li
01 Aug 2018
01 Aug 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Computation and Visualization of Multiple Density-Based Clustering Hierarchies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering