Abstract

As a ubiquitous method in the field of machine learning, clustering algorithm attracts a lot attention. Because only some basic information can be utilized, clustering data points into correct categories is a critical task especially when the cluster number is unknown. This paper presents an algorithm which can find the cluster number automatically. It firstly constructs hyper-planes based on the marginal of sample points. Then an adjacent relationship between data points is defined. Based on it, connective components are derived. According to a validity index proposed in this paper, the high-qualified connective components are selected as cluster centers. Meanwhile, the clusters’ number is also determined. Another contribution of this paper is that all the parameters in this algorithm can be set automatically. To evaluate its robustness, experiments on different kinds of benchmark datasets are carried out. They show that the performances are even better than some other methods’ best results which are selected manually.

Highlights

  • Cluster is a basic operation among the community of machine learning

  • Based on the similarity of the data points, clustering algorithms put them into categories so that the points in the same category are as similar as possible and as dissimilar as possible with points within other categories

  • Each dataset contains 1600 data points involving with the Gaussian noises

Read more

Summary

INTRODUCTION

Cluster is a basic operation among the community of machine learning. Based on the similarity of the data points, clustering algorithms put them into categories so that the points in the same category are as similar as possible and as dissimilar as possible with points within other categories. Instead of finding the regions of high density directly, they derive the number of cluster centers on the idea of simulating the process of man’s viewing peaks of mountains The advantage of these two kinds of methods is that they are not sensible to the distribution of data points [11]. The hyper-planes depend on the marginal space between global or local points in different clusters, it may crossing through some high density regions It can be approached for the data points which take the distribution on more general manifolds. Pseudocode for this process is presented in Algorithm 1. B. DERIVING THE CONNECTIVE COMPONENTS After the hyper-plane set H is found, the adjacency between data points can be determined.

1: Repeat
DERIVING THE CLUSTERS
CONCLUSION AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call