Abstract

AbstractData missing is a prevalent issue in various real‐world systems. It may deteriorate the performance of classification algorithms running on these platforms. Numerous effective imputation methods exist to address this problem. However, traditional data imputation approaches mainly focus on low‐dimensional missing data, and in addition, they do not make use of the randomness of the missing values and the information of labels simultaneously. To solve these problems, the authors propose a novel data imputation algorithm, named Particle Swarm Optimization for High‐dimensional mixed Missing variables data (PSOHM). PSOHM introduces a feature filtering algorithm for feature selection on missing data, followed by a feature discrimination method to categorize chosen features. PSOHM then employs particle swarm optimization to optimize imputation functions for both continuous and discrete features. Continuous features are modelled as Gaussian distributions, with the mean and standard deviation encoded into particles. Additionally, the probabilities of values for discrete features are also encoded. Moreover, accuracy serves as the optimization objective, utilizing both the randomness of missing values and the label information to improve the algorithm's performance. Six typical algorithms are employed to make a comparison. The results demonstrate that the authors’ method is superior to the compared approaches on the six different kinds of classical datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call