Abstract
We present a graph-theoretical approach to data clustering, which combines the creation of a graph from the data with Markov Stability, a multiscale community detection framework. We show how the multiscale capabilities of the method allow the estimation of the number of clusters, as well as alleviating the sensitivity to the parameters in graph construction. We use both synthetic and benchmark real datasets to compare and evaluate several graph construction methods and clustering algorithms, and show that multiscale graph-based clustering achieves improved performance compared to popular clustering methods without the need to set externally the number of clusters.
Highlights
Clustering is a classic task in data mining, whereby input data are organised into groups such that data points within a group are more similar to each other than to those outside the group (Xu and Wunsch 2005)
We evaluate several geometric graph constructions, from methods that use only local distances to others that balance local and global measures, and find that the recently proposed Continuous k-nearest neighbours (CkNN) graph (Berry and Sauer 2019) performs well for graph-based data clustering via community detection
We have investigated the use of multiscale community detection for graph-based data clustering
Summary
Clustering is a classic task in data mining, whereby input data are organised into groups (or clusters) such that data points within a group are more similar to each other than to those outside the group (Xu and Wunsch 2005). We evaluate several geometric graph constructions, from methods that use only local distances to others that balance local and global measures, and find that the recently proposed Continuous k-nearest neighbours (CkNN) graph (Berry and Sauer 2019) performs well for graph-based data clustering via community detection.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.