Abstract

Identifying the most informative features is a crucial step in feature selection. This paper focuses primarily on wrapper feature selection methods designed to detect important features with F1-score as the target metric. As an initial step, most wrapper methods order features according to importance. However, in most cases, the importance is defined according to the classification method used and varies with the characteristics of the data set. Using synthetically simulated data, we examine four existing feature ordering techniques to find the most desirable and the most effective ordering mechanism to identify informative features. Using the results, an improved method is suggested to extract the most informative feature subset from the data set. The method uses the sum of absolute values of the first k principal component loadings to order the features where k is a user-defined application-specific value. It also applies a sequential feature selection method to extract the best subset of features. We further compare the performance of the proposed feature selection method with results from the existing Recursive Feature Elimination (RFE) by simulating data for several practical scenarios with a different number of informative features and different imbalance rates. We also validate the method using a real-world application on several classification methods. The results based on the accuracy measures indicate that the proposed approach performs better than the existing feature selection methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call