Abstract

Grouping the objects based on their similarities is an important common task in machine learning applications. Many clustering methods have been developed, among them k-means based clustering methods have been broadly used and several extensions have been developed to improve the original k-means clustering method such as k-means ++ and kernel k-means. K-means is a linear clustering method; that is, it divides the objects into linearly separable groups, while kernel k-means is a non-linear technique. Kernel k-means projects the elements to a higher dimensional feature space using a kernel function, and then groups them. Different kernel functions may not perform similarly in clustering of a data set and, in turn, choosing the right kernel for an application could be challenging. In our previous work, we introduced a weighted majority voting method for clustering based on normalized mutual information (NMI). NMI is a supervised method where the true labels for a training set are required to calculate NMI. In this study, we extend our previous work of aggregating the clustering results to develop an unsupervised weighting function where a training set is not available. The proposed weighting function here is based on Silhouette index, as an unsupervised criterion. As a result, a training set is not required to calculate Silhouette index. This makes our new method more sensible in terms of clustering concept.

Highlights

  • IntroductionCluster analysis has been widely applied for dividing objects into different groups based on their similarities [2]

  • There is a high demand for developing new methods to discover hidden structures, identify patterns, and recognize different groups in machine learning applications [1].Cluster analysis has been widely applied for dividing objects into different groups based on their similarities [2]

  • We showed that the clustering results highly depend on the selected kernel function when using kernel k-means method

Read more

Summary

Introduction

Cluster analysis has been widely applied for dividing objects into different groups based on their similarities [2]. Cluster analysis is an unsupervised learning method [5] to optimize an objective function based on features similarities [6]. Clustering algorithms often use a search method to optimize the objective function. An objective function is optimized by minimizing the distance of elements to their cluster centers (within-cluster distance) and/or maximizing the distance between cluster centers (between-cluster distance). New cluster centers are obtained by averaging the Euclidean distances of all elements grouped in the same cluster in Step 2. K-means objective functions can be written as ∑kK = 1 ∑ xi ∈ πk k xi − μk k2 , where πk is cluster k, μk is the center of cluster k, and k · k is the Euclidean distance

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call