An Improved K-Means Algorithm Based on Impact Index

Shaobo Deng,Min Li,Lei Wang,Xuegang Li,Sujie Guan

doi:10.1109/acait56212.2022.10137982

Abstract

The k-means clustering algorithm is a very classical clustering algorithm that is widely used because of its excellent efficiency and performance. The algorithm uses Euclidean distance to calculate the similarity between samples and iteratively updates the membership matrix to obtain clustering results. However, when k-means algorithm clusters datasets containing samples with intra-cluster distances greater than inter-cluster distances, errors often occur when partitioning the boundary samples, which eventually leads to unsatisfactory results. Moreover, although k-means algorithm makes the intra-cluster distance as small as possible, it neglects to maximize the inter-cluster distance, and eventually only finds the local optimal solution. Different from the existing k-means type algorithm, this paper proposes a similarity measure based on the impact factor, which determines the partitioning result by comparing the impact of samples on each cluster. And on the basis of the objective function of k-means algorithm, we combine the inter-cluster distance to solve the defects of local optimality that exist in k-means algorithm. In the paper, we theoretically analyze and prove the proposed method, and compare and analyze the clustering results of the algorithm with the class k-means algorithm on real datasets, and confirm that the proposed algorithm in this paper can effectively avoid the defects of the class k-means algorithm.

Full Text