Abstract

Clustering is an important research field in machine learning. Traditional clustering approaches are not very effective in dealing with clusters having overlapping regions. To better capture the three types of relationships between a cluster and a sample, namely, belong-to fully, belong-to partially and not belong-to fully, we propose a theory of similarity-based sample’s stability and develop a three-step method for three-way clustering by integrating similarity-based sample’s stability into the idea of three-way clustering in this paper. In the proposed theory, the similarity of two samples is used to define the frequencies of two samples and the samples stability is calculated based on the defined frequencies and determinacy function. With this stability, the universe is divided into stable set and unstable set. The samples in the stable set are assigned into the core region of each cluster by using traditional clustering algorithm. The samples in the unstable set are assigned into the fringe region of corresponding cluster according to distances between the elements and the centers of the cluster core regions. Therefore, a three-way clustering is naturally formed. Experimental results on datasets show that this method can improve the structure of the clustering results.

Highlights

  • Data clustering is one of the most fundamental topics for data exploration in machine learning and plays has an important role in many fields such as information granulation, image analysis, network structure analysis and others [1–4]. e purpose of clustering is to discover the underlying structure of a data set by organizing the samples in the data set into several clusters such that the objects within a cluster are highly similar but remarkably dissimilar with objects in other clusters [5]. many researchers have done a series of research on clustering problem in the past decades and various kinds of clustering algorithms have been developed in the literature, including partitional, hierarchical, densitybased and grid-based clustering and so on

  • The samples in the stable set are assigned into the core region of each cluster by using traditional clustering algorithm. e samples in the unstable set are assigned into the fringe region of corresponding cluster according to distances between the elements and the centers of the cluster core region. erefore, a three-way clustering is naturally formed

  • We use k-means algorithm to obtain the core region of each cluster. e samples in the unstable set are assigned to the fringe region of each cluster by local co-association coefficient corresponding to the discovered core regions. e whole process can be shown as Algorithm 4

Read more

Summary

Introduction

Data clustering is one of the most fundamental topics for data exploration in machine learning and plays has an important role in many fields such as information granulation, image analysis, network structure analysis and others [1–4]. e purpose of clustering is to discover the underlying structure of a data set by organizing the samples in the data set into several clusters such that the objects within a cluster are highly similar but remarkably dissimilar with objects in other clusters [5]. many researchers have done a series of research on clustering problem in the past decades and various kinds of clustering algorithms have been developed in the literature, including partitional, hierarchical, densitybased and grid-based clustering and so on. We develop a new three-way clustering algorithm by similarity-based sample’s stability. We use the similarity of two samples to define the co-association frequency and compute the Figure 2: A demonstration of three-way cluster. E samples in the unstable set are assigned into the fringe region of corresponding cluster according to distances between the elements and the centers of the cluster core region. Base on the similarity of two samples, a new definition of co-association frequencies is proposed and the relation between the sample’s stability is discussed. E similarity-based sample’s stability that is measured by the proposed method is verified by experiments on UCI data sets.

Similarity-Based Sample’s Stability
Step 1
Step 2
Step 3
Experimental Results
S1 Zoo Iris Wine Dermatology Segmentation
Concluding Remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call