Seed selection algorithm through K-means on optimal number of clusters

Kuntal Chowdhury,Debasis Chaudhuri,Arup Kumar Pal,Ashok Samal

doi:10.1007/s11042-018-7100-4

Abstract

Clustering is one of the important unsupervised learning in data mining to group the similar features. The growing point of the cluster is known as a seed. To select the appropriate seed of a cluster is an important criterion of any seed based clustering technique. The performance of seed based algorithms are dependent on initial cluster center selection and the optimal number of clusters in an unknown data set. Cluster quality and an optimal number of clusters are the important issues in cluster analysis. In this paper, the proposed seed point selection algorithm has been applied to 3 band image data and 2D discrete data. This algorithm selects the seed point using the concept of maximization of the joint probability of pixel intensities with the distance restriction criteria. The optimal number of clusters has been decided on the basis of the combination of seven different cluster validity indices. We have also compared the results of our proposed seed selection algorithm on an optimal number of clusters using K-Means clustering with other classical seed selection algorithms applied through K-Means Clustering in terms of seed generation time (SGT), cluster building Time (CBT), segmentation entropy and the number of iterations (NOTK−means). We have also made the analysis of CPU time and no. of iterations of our proposed seed selection method with other clustering algorithms.

Full Text