Abstract

Feature selection (FS) in data with class imbalance or missing values has received much attention from researchers due to their universality in real-world applications. However, for data with both the two characteristics above, there is still a lack of the corresponding FS algorithm. Due to the complex coupling relationship between missing data and class imbalance, the need for better FS method becomes essential. To tackle high-dimensional imbalanced data with missing values, this article studies a new evolutionary FS method. First, an improved <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$F$ </tex-math></inline-formula> -measure based on filling risk (RF-measure) is defined to evaluate the influence of missing data on the performance of FS in the case of class imbalance. Following that taking the RF-measure as an objective function, a particle swarm optimization-based FS method with fuzzy clustering (PSOFS-FC) is proposed. Two new problem-specific operators or strategies, i.e., the swarm initialization strategy guided by fuzzy clustering and the local pruning operator based on feature importance, are developed to improve the performance of PSOFS-FC. Compared with state-of-the-art FS algorithms on several public datasets, experimental results show that PSOFS-FC can achieve excellent classification performance with relatively less running time, indicating its superiority on tackling high-dimensional imbalanced data with missing values.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call