Abstract

Clustering aims at discovering the natural groupings in a dataset, prevalent in many disciplines that involve multivariate data analysis. In this paper, we propose a physically inspired graph-theoretical clustering method, which first makes the data points organized into an attractive graph, called In-Tree, via a physically inspired rule, called Nearest Descent (ND). The rule of ND works to select the nearest node in the descending direction of potential as the parent node of each node, which is fundamentally different from the classical Gradient Descent. The constructed In-Tree proves a very good candidate for clustering due to its particular features and properties. In the In-Tree, the original clustering problem is reduced to a problem of removing the inter-cluster edges from this graph. Pleasingly, those inter-cluster edges are usually so distinguishable that they can be easily determined by different automatic edge-cutting methods. We also propose a visualized strategy to validate the effectiveness of the automatic edge-cutting methods. The experimental results reveal that the proposed method is superior to the related clustering methods. The results also reveal the characteristics of different automatic cutting methods and the meaningfulness of the visualized strategy in increasing the reliability of the clustering results in practice.

Highlights

  • Clustering, or cluster analysis, aims at discovering the natural groupings in a dataset [1]

  • Gaussian Mixture Model (GMM) is sensitive to initialization and unable to handle with irregularly shaped clusters

  • We propose a visualized strategy to validate the effectiveness of each automatic edge cutting method, which may help increase the reliability of the clustering results in practice

Read more

Summary

Introduction

Clustering, or cluster analysis, aims at discovering the natural groupings in a dataset [1]. K-means is sensitive initialization, requires users to pre-specify the cluster number, and is unable to handle with the non-spherical or unbalanced clusters. Another popular partitioning-based clustering method, Affinity Propagation (AP) [6], does not require users to specify the cluster number in advance, it requires users to pre-define another non-trivial parameter (i.e., the so-called “preference”), for which an unsuitable setting may lead to the over-partitioning problem. GMM is sensitive to initialization and unable to handle with irregularly shaped clusters Spectral clustering such as Normalized cuts (Ncut) [9] and the method proposed by Ng, Jordan and Weiss (N-J-W) [10], and Density-based clustering such as MeanShift [11–13] and DBSCAN [14] are widely used, due to their advantages in detecting non-spherical clusters. Ncut and N-J-W involve time-consuming spectral decomposition [15], MeanShift involves time-consuming iteration [16–19], and DBSCAN is sensitive to parameter setting and unable to detect clusters of varying scales [20–22]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call