K-means clustering for the analysis of incomplete business data

An Qi ,Ximing Ma

doi:10.25236/ajcis.2021.040606

Abstract

Missing values can significantly reduce the accuracy and availability of business data. Usually, when clustering incomplete data, the data with missing values are deleted, and only the complete data are analyzed. However, this often leads to significant loss or deviation of information. This paper mainly studies how to use unsupervised machine learning techniques to deal with missing values. The combination of imputation method and clustering technology forms a new method to deal with missing values, which is helpful to overcome the problem of missing data. We propose a strategy based on the combination of K-means, big data K-means, p-k-means, and mean imputation method, singular value decomposition imputation method, k-nearest neighbor imputation method. By comparing the performance of nine methods in different business data sets. The experimental analysis was carried out on four benchmark data sets. The effectiveness of K-means clustering and imputation methods is verified on different data sets, and the results also have a certain application prospect.

Full Text