Improved K-Nearest Neighbor Missing Data Classification Based on Interval Value Imputation

Ziyang Zhang,Can Tang

doi:10.1109/eebda56825.2023.10090609

Abstract

In reality, when processing data sets for classification, there are often missing data sets, which brings inconvenience to the classification work. To this end, this paper proposes a method to impute incomplete data based on the interval value of K nearest neighbors. The method uses the Euclidean distance between the incomplete data and the complete data to find the K closest complete data to the incomplete data, so that the nearest neighbor can be constructed according to the corresponding attribute value of the complete data to the missing attribute value of the incomplete data. interval. Next, the dataset is constructed into an interval-valued dataset. Based on the interval-valued distance algorithm, the incomplete data can be classified by the K-nearest neighbor algorithm. The experimental results show that the improved K-nearest neighbor algorithm based on interval value imputation is more efficient than the traditional 0-value imputation, median imputation and mean imputation K-nearest neighbor algorithm under certain circumstances.

Full Text