Abstract

The k-means algorithm is generally the most known and used clustering method. There are various extensions of k-means to be proposed in the literature. Although it is an unsupervised learning to clustering in pattern recognition and machine learning, the k-means algorithm and its extensions are always influenced by initializations with a necessary number of clusters a priori. That is, the k-means algorithm is not exactly an unsupervised clustering method. In this paper, we construct an unsupervised learning schema for the k-means algorithm so that it is free of initializations without parameter selection and can also simultaneously find an optimal number of clusters. That is, we propose a novel unsupervised k-means (U-k-means) clustering algorithm with automatically finding an optimal number of clusters without giving any initialization and parameter selection. The computational complexity of the proposed U-k-means clustering algorithm is also analyzed. Comparisons between the proposed U-k-means and other existing methods are made. Experimental results and comparisons actually demonstrate these good aspects of the proposed U-k-means clustering algorithm.

Highlights

  • Clustering is a useful tool in data science

  • Most clustering algorithms, including k-means, are employed to give different numbers of clusters with associated cluster memberships, and these clustering results are evaluated by multiple validity measures to determine the most practically plausible clustering results with the estimated number of clusters [13]

  • In this paper we propose a new schema with a learning framework for the k-means clustering algorithm

Read more

Summary

INTRODUCTION

Clustering is a useful tool in data science. It is a method for finding cluster structure in a data set that is characterized by the greatest similarity within the same cluster and the greatest dissimilarity between different clusters. Users need to specify a range of cluster numbers in which the true cluster number reasonably lies and a model selection, such as BIC or AIC, is used to do the splitting process These k-means clustering algorithms can find the number of clusters, such as cluster validity indices and X-means, they use extra iteration steps outside the clustering algorithms. No work in the literature for k-means can be free of initializations, parameter selection and simultaneously find the number of clusters. We suppose that this is due to its difficulty for constructing this kind of the k-means algorithm. We first construct a learning procedure for the k-means clustering algorithm This learning procedure can automatically find the number of clusters without any initialization and parameter selection.

RELATED WORKS
THE UNSUPERVISED K-MEANS CLUSTERING
EXPERIMENTAL RESULTS AND COMPARISONS
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.