Abstract

The target of the clustering analysis is to group a set of data points into several clusters based on the similarity or distance. The similarity or distance is usually a scalar used in numerous traditional clustering algorithms. Nevertheless, a vector, such as data gravitational force, contains more information than a scalar and can be applied in clustering analysis to promote clustering performance. Therefore, this paper proposes a three-stage hierarchical clustering approach called GHC, which takes advantage of the vector characteristic of data gravitational force inspired by the law of universal gravitation. In the first stage, a sparse gravitational graph is constructed based on the top k data gravitations between each data point and its neighbors in the local region. Then the sparse graph is partitioned into many subgraphs by the gravitational influence coefficient. In the last stage, the satisfactory clustering result is obtained by merging these subgraphs iteratively by using a new linkage criterion. To demonstrate the performance of GHC algorithm, the experiments on synthetic and real-world data sets are conducted, and the results show that the GHC algorithm achieves better performance than the other existing clustering algorithms.

Highlights

  • Clustering is one of the major unsupervised learning techniques and has been applied in many fields such as pattern recognition [1], image processing [2, 3], community detection [4, 5], bioinformatics [6, 7], information retrieval [8, 9], and so on. e main task of clustering is to classify a dataset into some nonoverlapping clusters based on a suitable similarity metric so that the elements in the same cluster are similar, while any elements from different clusters are dissimilar

  • Agglomerative hierarchical clustering algorithm starts with every single element in a dataset. en it aggregates the closest clusters with a linkage criterion in each iteration until all elements form one cluster. e divisive hierarchical clustering algorithm starts with the dataset considered a single cluster which is separated into many subclusters until every element forms a cluster. e other differences among hierarchical clustering approaches are determined by the diverse choices of similarity criteria and the linkage criteria

  • Data gravitational force, which is like the universal gravitational force, is employed to cluster data points. en, we propose a novel hierarchical clustering based on the sparse gravitational graph in which the vertex denotes each object of a data set, and the edge denotes that data gravitation force exists between its two vertices

Read more

Summary

Introduction

Clustering is one of the major unsupervised learning techniques and has been applied in many fields such as pattern recognition [1], image processing [2, 3], community detection [4, 5], bioinformatics [6, 7], information retrieval [8, 9], and so on. e main task of clustering is to classify a dataset into some nonoverlapping clusters based on a suitable similarity metric so that the elements in the same cluster are similar, while any elements from different clusters are dissimilar. E divisive hierarchical clustering algorithm starts with the dataset considered a single cluster which is separated into many subclusters until every element forms a cluster. There are many multistage hierarchical clustering algorithms, such as Chameleon [16], which is a representative approach and can detect the arbitrary shape of the cluster effectively. En, we propose a novel hierarchical clustering based on the sparse gravitational graph in which the vertex denotes each object of a data set, and the edge denotes that data gravitation force exists between its two vertices. E GHC algorithm can be used to detect arbitrary clusters effectively and achieves an excellent clustering performance on the synthetic and real-life data sets in this study.

Related Work of Gravity-Based Clustering
Data Gravitation Model
Experiments
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call