Abstract

Clustering is an essential unsupervised technique when category information is not available. Although K-means and Max-min distance K-means clustering algorithms are widely used, they have some disadvantages such as dependence on the initial centers, sensitivity to outliers caused by using only distance as the clustering criterion. To overcome the problems, this paper proposes SMM-K-means algorithm which overcomes the dependence on the initial cluster centers and the initial number of clusters and the sensitivity to the outliers. First, the initial value K of the optimal cluster number is determined by the elbow method, and K-means is used for initial clustering. A new inter-cluster separation measure is then constructed based on the idea of q-nearest neighbors, which is constructed by comprehensive considering the separation between clusters and the distribution compactness of clusters themselves. Finally, the two sample points with highest degree of separation are brought into Max-min distance K-means algorithm as new initial centers for clustering. The definite determining method of cluster centers eliminates the complicated iterative calculation, and the construction of inter-cluster separation measure overcomes the sensitivity of clustering results to noise points and isolated points, and has good applicability and generalization. In addition, this algorithm is not limited by the shape and size of the clusters and has better flexibility. The experimental results show that the SMM-K-means algorithm has higher CH values, resulting in a better clustering effect and stability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call