Cluster Center Initialization and Outlier Detection Based on Distance and Density for the K-Means Algorithm

Qi He,Chuan Zhao,Zhenxiang Chen,Ke Ji,Yuliang Shi,Kun Ma,Lin Wang

doi:10.1007/978-3-030-16657-1_49

Abstract

K-means algorithm, the most classic partition-based clustering method, has its disadvantages. If there are outliers in the data sets, the K-means algorithm may lead to serious deviation of the mean value. In addition, random initialization is very sensitive to the input data parameters. In this paper, we propose initialization and outlier detection based on distance and density for the K-means algorithm (KMIDDO), an improvement method to optimize the initial center points, especially it has more effective in the case of outliers. What’s more, we extend an outlier detection method to improve the clustering effect. We hope the distance between every two center points is as far as possible and the density of the center points are as large as they can. In terms of initialization, we calculate the distance and density of points. In the outliers detection, we take the outliers as a single class based on the distance and density. Experiments are conducted to illustrate the effectiveness and accuracy of the proposed algorithms on several synthetic and real datasets.

Full Text